Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-46
Aliasing Cases in the Pentium M Processor
Pentium M, Intel Core Solo and Intel Core Duo processors have the
following aliasing case:
Store forwarding - If there has been a store to an address followed
by a load to the same address within a short time window, the load
will not proceed until the store data is available. If a store is
followed by a load where their addresses differ by a multiply of 4K
Bytes the load also stalls until the store retires.
Assembly/Compiler Coding Rule 25. (H impact, MH generality) Lay out
data or order computation to avoid having cache lines that have linear
addresses that are a multiple of 64 KB apart in the same working set. Avoid
having more than 4 cache lines that are some multiple of 2 KB apart in the
same first-level cache working set, and avoid having more than eight cache
lines that are some multiple of 4 KB apart in the same first-level cache working
set. Avoid having a store followed by a non-dependent load with addresses that
differ by a multiple of 4 KB.
When declaring multiple arrays that are referenced with the same index
and are each a multiple of 64 KB (as can happen with
struct_of_array data layouts), pad them to avoid declaring them
contiguously. Padding can be accomplished by either intervening
declarations of other variables, or by artificially increasing the
dimension.
User/Source Coding Rule 4. (H impact, ML generality) Consider using a
special memory allocation library to avoid aliasing.
One way to implement a memory allocator to avoid aliasing is to
allocate more than enough space and pad. For example, allocate
structures that are 68 KB instead of 64 KB to avoid the 64 KB aliasing;
or have the allocator pad and return random offsets that are a multiple of
128 Bytes (the size of a cache line).
User/Source Coding Rule 5. (M impact, M generality) When padding
variable declarations to avoid aliasing, the greatest benefit comes from
avoiding aliasing on second-level cache lines, suggesting an offset of 128 bytes
or more.