Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-43
If for some reason it is not possible to align the stack for 64-bits, the
routine should access the parameter and save it into a register or known
aligned storage, thus incurring the penalty only once.
Capacity Limits and Aliasing in Caches
There are cases where addresses with a given stride will compete for
some resource in the memory hierarchy.
Typically, caches are implemented to have multiple ways of set
associativity, with each way consisting of multiple sets of cache lines (or
sectors in some cases). Multiple memory references that compete for the
same set of each way in a cache can cause a capacity issue. There are
aliasing conditions that apply to specific microarchitectures. Note that
first-level cache lines are 64 bytes. Thus the least significant 6 bits are
not considered in alias comparisons. For the Pentium 4 and Intel Xeon
processors, data is loaded into the second level cache in a sector of
128 bytes, so the least significant 7 bits are not considered in alias
comparisons.
Example 2-20 Dynamic Stack Alignment
prologue:
subl esp, 4 ; save frame ptr
movl [esp], ebp
movl ebp, esp ; new frame pointer
andl ebp, 0xFFFFFFFC; aligned to 64 bits
movl [ebp], esp ; save old stack ptr
subl esp, FRAMESIZE ; allocate space
; ... callee saves, etc.
epilogue:
; ... callee restores, etc.
movl esp, [ebp] ; restore stack ptr
movl ebp, [esp] ; restore frame ptr
addl esp, 4
ret