Support User Manuals

Intel Processor Computer Hardware User Manual

Open as PDF

of 289

Developer’s Manual March, 2003 B-33

Intel

®

80200 Processor based on Intel

®

XScale

™

Microarchitecture

Optimization Guide

B.4.4.11. Loop Interchange

As mentioned earlier, the sequence in which data is accessed affects cache thrashing. Usually, it is

best to access data in a contiguous spatially address range. However, arrays of data may have been

laid out such that indexed elements are not physically next to each other. Consider the following C

code which places array elements in row major order.

for(j=0; j<NMAX; j++)

for(i=0; i<NMAX; i++)

{

prefetch(A[i+1][j]);

sum += A[i][j];

}

In the above example, A[i][j] and A[i+1][j] are not sequentially next to each other. This situation

causes an increase in bus traffic when prefetching loop data. In some cases where the loop

mathematics are unaffected, the problem can be resolved by induction variable interchange. The

above examples becomes:

for(i=0; i<NMAX; i++)

for(j=0; j<NMAX; j++)

{

prefetch(A[i][j+1]);

sum += A[i][j];

}

B.4.4.12. Loop Fusion

Loop fusion is a process of combining multiple loops, which reuse the same data, in to one loop.

The advantage of this is that the reused data is immediately accessible from the data cache.

Consider the following example:

for(i=0; i<NMAX; i++)

{

prefetch(A[i+1], c[i+1], c[i+1]);

A[i] = b[i] + c[i];

}

for(i=0; i<NMAX; i++)

{

prefetch(D[i+1], c[i+1], A[i+1]);

D[i] = A[i] + c[i];

}

The second loop reuses the data elements A[i] and c[i]. Fusing the loops together produces:

for(i=0; i<NMAX; i++)

{

prefetch(D[i+1], A[i+1], c[i+1], b[i+1]);

ai = b[i] + c[i];

A[i] = ai;

D[i] = ai + c[i];

}

previous next