Optimizing Cache Usage 6
6-25
• Balance single-pass versus multi-pass execution
• Resolve memory bank conflict issues
• Resolve cache management issues
The subsequent sections discuss all the above items.
Software Prefetch Scheduling Distance
Determining the ideal prefetch placement in the code depends on many
architectural parameters, including the amount of memory to be
prefetched, cache lookup latency, system memory latency, and estimate
of computation cycle. The ideal distance for prefetching data is
processor- and platform-dependent. If the distance is too short, the
prefetch will not hide any portion of the latency of the fetch behind
computation. If the prefetch is too far ahead, the prefetched data may be
flushed out of the cache by the time it is actually required.
Since prefetch distance is not a well-defined metric, for this discussion,
we define a new term, prefetch scheduling distance (PSD), which is
represented by the number of iterations. For large loops, prefetch
scheduling distance can be set to 1, that is, schedule prefetch
instructions one iteration ahead. For small loop bodies, that is, loop
iterations with little computation, the prefetch scheduling distance must
be more than one iteration.
A simplified equation to compute PSD is deduced from the
mathematical model. For a simplified equation, complete mathematical
model, and methodology of prefetch distance determination, refer to
Appendix E, “Mathematics of Prefetch Scheduling Distance”.
Example 6-3 illustrates the use of a prefetch within the loop body. The
prefetch scheduling distance is set to 3,
esi is effectively the pointer to a
line,
edx is the address of the data being referenced and xmm1-xmm4 are
the data used in computation. Example 6-4 uses two independent cache