Optimizing Cache Usage 6
6-3
• Facilitate compiler optimization:
— Minimize use of global variables and pointers
— Minimize use of complex control flow
—Use the
const modifier, avoid register modifier
— Choose data types carefully (see below) and avoid type casting.
• Use cache blocking techniques (for example, strip mining):
— Improve cache hit rate by using cache blocking techniques such
as strip-mining (one dimensional arrays) or loop blocking (two
dimensional arrays)
— Explore using hardware prefetching mechanism if your data
access pattern has sufficient regularity to allow alternate
sequencing of data accesses (e.g., tiling) for improved spatial
locality; otherwise use
prefetchnta.
• Balance single-pass versus multi-pass execution:
— An algorithm can use single- or multi-pass execution defined as
follows: single-pass, or unlayered execution passes a single data
element through an entire computation pipeline. Multi-pass, or
layered execution performs a single stage of the pipeline on a
batch of data elements before passing the entire batch on to the
next stage.
— General guideline to minimize pollution: if your algorithm is
single-pass use
prefetchnta; if your algorithm is multi-pass
use
prefetcht0.
• Resolve memory bank conflict issues:
— Minimize memory bank conflicts by applying array grouping to
group contiguously used data together or allocating data within
4 KB memory pages.
• Resolve cache management issues:
— Minimize disturbance of temporal data held within the
processor’s caches by using streaming store instructions, as
appropriate.