Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-29
Memory Accesses
This section discusses guidelines for optimizing code and data memory
accesses. The most important recommendations are:
align data, paying attention to data layout and stack alignment
enable store forwarding
place code and data on separate pages
enhance data locality
use prefetching and cacheability control instructions
enhance code locality and align branch targets
take advantage of write combining
Alignment and forwarding problems are among the most common
sources of large delays on the Pentium 4 processor.
Alignment
Alignment of data concerns all kinds of variables:
dynamically allocated
members of a data structure
global or local variables
parameters passed on the stack
Misaligned data access can incur significant performance penalties. This
is particularly true for cache line splits. The size of a cache line is
64 bytes in the Pentium 4, Intel Xeon, and Pentium M processors.
On the Pentium 4 processor, an access to data unaligned on 64-byte
boundary leads to two memory accesses and requires several µops to be
executed (instead of one). Accesses that span 64-byte boundaries are
likely to incur a large performance penalty, since they are executed near
retirement, and can incur stalls that are on the order of the depth of the
pipeline.