Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

Optimizing Cache Usage 6

6-9

Currently, the prefetch instruction provides a greater performance gain

than preloading because it:

• has no destination register, it only updates cache lines.

• does not stall the normal instruction retirement.

• does not affect the functional behavior of the program.

• has no cache line split accesses.

• does not cause exceptions except when LOCK prefix is used; the LOCK

prefix is not a valid prefix for use with the

prefetch instructions

and should not be used.

• does not complete its own execution if that would cause a fault.

The current advantages of the prefetch over preloading instructions are

processor-specific. The nature and extent of the advantages may change

in the future.

In addition, there are cases where a prefetch instruction will not perform

the data prefetch. These include:

• the prefetch causes a DTLB (Data Translation Lookaside Buffer)

miss. This applies to Pentium 4 processors with CPUID signature

corresponding to family 15, model 0, 1 or 2. The prefetch

instruction resolves a DTLB miss and fetches data on Pentium 4

processors with CPUID signature corresponding to family 15,

model 3.

• an access to the specified address causes a fault/exception.

• the memory subsystem runs out of request buffers between the

first-level cache and the second-level cache.

• the prefetch targets an uncacheable memory region, for example,

USWC and UC.

• a LOCK prefix is used. This causes an invalid opcode exception.

Cacheability Control

This section covers the mechanics of the cacheability control

instructions.

previous next