Intel IA-32 Computer Accessories User Manual


 
Optimizing Cache Usage 6
6-41
references enables the hardware prefetcher to initiate bus requests to
read some cache lines before the code actually reference the linear
addresses.
Single-pass versus Multi-pass Execution
An algorithm can use single- or multi-pass execution defined as follows:
Single-pass, or unlayered execution passes a single data element
through an entire computation pipeline.
Multi-pass, or layered execution performs a single stage of the
pipeline on a batch of data elements, before passing the batch on to
the next stage.
A characteristic feature of both single-pass and multi-pass execution is
that a specific trade-off exists depending on an algorithm’s
implementation and use of a single-pass or multiple-pass execution, see
Figure 6-8.
Multi-pass execution is often easier to use when implementing a general
purpose API, where the choice of code paths that can be taken depends
on the specific combination of features selected by the application (for
example, for 3D graphics, this might include the type of vertex
primitives used and the number and type of light sources).
With such a broad range of permutations possible, a single-pass
approach would be complicated, in terms of code size and validation. In
such cases, each possible permutation would require a separate code
sequence. For example, an object with features A, B, C, D can have a
subset of features enabled, say, A, B, D. This stage would use one code
path; another combination of enabled features would have a different
code path. It makes more sense to perform each pipeline stage as a
separate pass, with conditional clauses to select different features that
are implemented within each stage. By using strip-mining, the number
of vertices processed by each stage (for example, the batch size) can be