Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
B-14
Current implementations of the BSQ_cache_reference event do not
distinguish between programmatic read and write misses.
Programmatic writes that miss must get the rest of the cache line and
merge the new data. Such a request is called a read for ownership
(RFO). To the “BSQ_cache_reference” hardware, both a programmatic
read and an RFO look like a data bus read, and are counted as such.
Further distinction between programmatic reads and RFOs may be
provided in future implementations.
Current implementations of the BSQ_cache_reference event can suffer
from perceived over- or under-counting. References are based on BSQ
allocations, as described above. Consequently, read misses are
generally counted once per 128-byte line BSQ allocation (whether one
or both sectors are referenced), but read and write (RFO) hits and most
write (RFO) misses are counted once per 64-byte line, the size of a core
reference. This makes the event counts for read misses appear to have a
2-times overcounting with respect to read and write (RFO) hits and
write (RFO) misses. This granularity mismatch cannot always be
corrected for, making it difficult to correlate to the number of
programmatic misses and hits. If the user knows that both sectors in a
128 -byte line are always referenced soon after each other, then the
number of read misses can be multiplied by two to adjust miss counts to
a 64-byte granularity.
Prefetches themselves are not counted as either hits or misses, as of
Pentium 4 and Intel Xeon processors with a CPUID signature of 0xf21.
However, in Pentium 4 Processor implementations with a CPUID
signature of 0xf07 and earlier have the problem that reads to lines that
are already being prefetched are counted as hits in addition to misses,
thus overcounting hits.
The number of “Reads Non-prefetch from the Processor” is a good
approximation of the number of outermost cache misses due to loads or
RFOs, for the writeback memory type.