Intel IA-32 Computer Accessories User Manual


 
Optimizing Cache Usage 6
6-9
Currently, the prefetch instruction provides a greater performance gain
than preloading because it:
has no destination register, it only updates cache lines.
does not stall the normal instruction retirement.
does not affect the functional behavior of the program.
has no cache line split accesses.
does not cause exceptions except when LOCK prefix is used; the LOCK
prefix is not a valid prefix for use with the
prefetch instructions
and should not be used.
does not complete its own execution if that would cause a fault.
The current advantages of the prefetch over preloading instructions are
processor-specific. The nature and extent of the advantages may change
in the future.
In addition, there are cases where a prefetch instruction will not perform
the data prefetch. These include:
the prefetch causes a DTLB (Data Translation Lookaside Buffer)
miss. This applies to Pentium 4 processors with CPUID signature
corresponding to family 15, model 0, 1 or 2. The prefetch
instruction resolves a DTLB miss and fetches data on Pentium 4
processors with CPUID signature corresponding to family 15,
model 3.
an access to the specified address causes a fault/exception.
the memory subsystem runs out of request buffers between the
first-level cache and the second-level cache.
the prefetch targets an uncacheable memory region, for example,
USWC and UC.
a LOCK prefix is used. This causes an invalid opcode exception.
Cacheability Control
This section covers the mechanics of the cacheability control
instructions.