IA-32 Intel® Architecture Optimization
2-42
non-sequential manner, the automatic hardware prefetcher cannot
prefetch the data. The prefetcher can recognize up to eight concurrent
streams. See Chapter 6 for more information and the hardware
prefetcher.
Memory coherence is maintained on 64-byte cache lines on the
Pentium 4, Intel Xeon and Pentium M processors, rather than earlier
processors’ 32-byte cache lines. This can increase the opportunity for
false sharing.
User/Source Coding Rule 3. (M impact, L generality) Beware of false
sharing within a cache line (64 bytes) for Pentium 4, Intel Xeon, and
Pentium M processors; and within a sector of 128 bytes on Pentium 4 and
Intel Xeon processors.
Stack Alignment
The easiest way to avoid stack alignment problems is to keep the stack
aligned at all times. For example: if a language only supports 8-bit,
16-bit, 32-bit, and 64-bit data quantities, but never uses 80-bit data
quantities; the language can require the stack to always be aligned on a
64-bit boundary.
Assembly/Compiler Coding Rule 24. (H impact, M generality) If 64-bit
data is ever passed as a parameter or allocated on the stack, make sure that the
stack is aligned to an 8-byte boundary.
Doing so will require the use of a general purpose register (such as EBP)
as a frame pointer. The trade-off is between causing unaligned 64-bit
references if the stack is not aligned and causing extra general purpose
register spills if the stack is aligned. Note that a performance penalty is
caused only when an unaligned access splits a cache line. This means
that one out of eight spatially consecutive unaligned accesses is always
penalized.
A routine that makes frequent use of 64-bit data can avoid stack
misalignment by placing the code described in Example 2-20 in the
function prologue and epilogue.