Intel 80200 Computer Hardware User Manual


 
B-22 March, 2003 Developers Manual
Intel
®
80200 Processor based on Intel
®
XScale
Microarchitecture
Optimization Guide
B.4.2.6. Data Alignment
Cache lines begin on 32-byte address boundaries. To maximize cache line use and minimize cache
pollution, data structures should be aligned on 32 byte boundaries and sized to multiple cache line
sizes. Aligning data structures on cache address boundaries simplifies later addition of prefetch
instructions to optimize performance.
Not aligning data on cache lines has the disadvantage of moving the prefetch address
correspondingly to the misalignment. Consider the following example:
struct {
long ia;
long ib;
long ic;
long id;
} tdata[IMAX];
for (i=0, i<IMAX; i++)
{
PREFETCH(tdata[i+1]);
tdata[i].ia = tdata[i].ib + tdata[i].ic _tdata[i].id];
....
tdata[i].id = 0;
}
In this case if tdata[] is not aligned to a cache line, then the prefetch using the address of
tdata[i+1].ia may not include element id. If the array was aligned on a cache line + 12 bytes, then
the prefetch would halve to be placed on &tdata[i+1].id.
If the structure is not sized to a multiple of the cache line size, then the prefetch address must be
advanced appropriately and requires extra prefetch instructions. Consider the following example:
struct {
long ia;
long ib;
long ic;
long id;
long ie;
} tdata[IMAX];
ADDRESS preadd = tdata
for (i=0, i<IMAX; i++)
{
PREFETCH(predata+=16);
tdata[I].ia = tdata[I].ib + tdata[I].ic _tdata[I].id] +
tdata[I].ie;
....
tdata[I].ie = 0;
}
In this case, the prefetch address was advanced by size of half a cache line and every other prefetch
instruction is ignored. Further, an additional register is required to track the next prefetch address.
Generally, not aligning and sizing data adds extra computational overhead.
Additional prefetch considerations are discussed in greater detail in following sections.