Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
6-36
Figure 6-7 shows how prefetch instructions and strip-mining can be
applied to increase performance in both of these scenarios.
For Pentium 4 processors, the left scenario shows a graphical
implementation of using
prefetchnta to prefetch data into selected
ways of the second-level cache only (SM1 denotes strip mine one way
of second-level), minimizing second-level cache pollution. Use
prefetchnta if the data is only touched once during the entire
execution pass in order to minimize cache pollution in the higher level
caches. This provides instant availability, assuming the prefetch was
issued far ahead enough, when the read access is issued.
Figure 6-7 Examples of Prefetch and Strip-mining for Temporally Adjacent and
Non-Adjacent Passes Loops
Temporally
non-adjacent passes
Temporally
adjacent passes
Prefetchnta
Dataset A
Reuse
Dataset A
Reuse
Dataset B
Prefetchnta
Dataset B
SM1
SM1
Prefetcht0
Dataset A
Prefetcht0
Dataset B
Reuse
Dataset B
Reuse
Dataset A
SM2