Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
1-22
avoids the need to access off-chip caches, which can increase the
realized bandwidth compared to a normal load-miss, which returns
data to all cache levels
Situations that are less likely to benefit from software prefetch are:
for cases that are already bandwidth bound, prefetching tends to
increase bandwidth demands
prefetching far ahead can cause eviction of cached data from the
caches prior to the data being used in execution
not prefetching far enough can reduce the ability to overlap memory
and execution latencies
Software prefetches are treated by the processor as a hint to initiate a
request to fetch data from the memory system, and consume resources
in the processor and the use of too many prefetches can limit their
effectiveness. Examples of this include prefetching data in a loop for a
reference outside the loop and prefetching in a basic block that is
frequently executed, but which seldom precedes the reference for which
the prefetch is targeted.
See also: Chapter 6, “Optimizing Cache Usage.”
Automatic hardware prefetch is a feature in the Pentium 4 processor.
It brings cache lines into the unified second-level cache based on prior
reference patterns. See also: Chapter 6, “Optimizing Cache Usage.”
Pros and Cons of Software and Hardware Prefetching. Software
prefetching has the following characteristics:
handles irregular access patterns, which would not trigger the
hardware prefetcher
handles prefetching of short arrays and avoids hardware prefetching
start-up delay before initiating the fetches
must be added to new code; so it does not benefit existing
applications