Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

3-34

In Example 3-19, the computation has been strip-mined to a size

strip_size. The value strip_size is chosen such that strip_size

elements of array

v[Num] fit into the cache hierarchy. By doing this, a

given element

v[i] brought into the cache by Transform(v[i]) will

still be in the cache when we perform

Lighting(v[i]), and thus

improve performance over the non-strip-mined code.

Loop Blocking

Loop blocking is another useful technique for memory performance

optimization. The main purpose of loop blocking is also to eliminate as

many cache misses as possible. This technique transforms the memory

domain of a given problem into smaller chunks rather than sequentially

traversing through the entire memory domain. Each chunk should be

small enough to fit all the data for a given computation into the cache,

thereby maximizing data reuse. In fact, one can treat loop blocking as

strip mining in two or more dimensions. Consider the code in

Example 3-18 and access pattern in Figure 3-3. The two-dimensional

array

A is referenced in the j (column) direction and then referenced in

the

i (row) direction (column-major order); whereas array B is

referenced in the opposite manner (row-major order). Assume the

memory layout is in column-major order; therefore, the access strides of

array

A and B for the code in Example 3-20 would be 1 and MAX,

respectively.

previous next