Optimizing Cache Usage 6
6-9
Currently, the prefetch instruction provides a greater performance gain
than preloading because it:
• has no destination register, it only updates cache lines.
• does not stall the normal instruction retirement.
• does not affect the functional behavior of the program.
• has no cache line split accesses.
• does not cause exceptions except when LOCK prefix is used; the LOCK
prefix is not a valid prefix for use with the
prefetch instructions
and should not be used.
• does not complete its own execution if that would cause a fault.
The current advantages of the prefetch over preloading instructions are
processor-specific. The nature and extent of the advantages may change
in the future.
In addition, there are cases where a prefetch instruction will not perform
the data prefetch. These include:
• the prefetch causes a DTLB (Data Translation Lookaside Buffer)
miss. This applies to Pentium 4 processors with CPUID signature
corresponding to family 15, model 0, 1 or 2. The prefetch
instruction resolves a DTLB miss and fetches data on Pentium 4
processors with CPUID signature corresponding to family 15,
model 3.
• an access to the specified address causes a fault/exception.
• the memory subsystem runs out of request buffers between the
first-level cache and the second-level cache.
• the prefetch targets an uncacheable memory region, for example,
USWC and UC.
• a LOCK prefix is used. This causes an invalid opcode exception.
Cacheability Control
This section covers the mechanics of the cacheability control
instructions.