Intel IA-32 Computer Accessories User Manual


 
Optimizing Cache Usage 6
6-3
Facilitate compiler optimization:
Minimize use of global variables and pointers
Minimize use of complex control flow
—Use the
const modifier, avoid register modifier
Choose data types carefully (see below) and avoid type casting.
Use cache blocking techniques (for example, strip mining):
Improve cache hit rate by using cache blocking techniques such
as strip-mining (one dimensional arrays) or loop blocking (two
dimensional arrays)
Explore using hardware prefetching mechanism if your data
access pattern has sufficient regularity to allow alternate
sequencing of data accesses (e.g., tiling) for improved spatial
locality; otherwise use
prefetchnta.
Balance single-pass versus multi-pass execution:
An algorithm can use single- or multi-pass execution defined as
follows: single-pass, or unlayered execution passes a single data
element through an entire computation pipeline. Multi-pass, or
layered execution performs a single stage of the pipeline on a
batch of data elements before passing the entire batch on to the
next stage.
General guideline to minimize pollution: if your algorithm is
single-pass use
prefetchnta; if your algorithm is multi-pass
use
prefetcht0.
Resolve memory bank conflict issues:
Minimize memory bank conflicts by applying array grouping to
group contiguously used data together or allocating data within
4 KB memory pages.
Resolve cache management issues:
Minimize disturbance of temporal data held within the
processor’s caches by using streaming store instructions, as
appropriate.