Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-102
Assembly/Compiler Coding Rule 18. (H impact, M generality) A load
that forwards from a store must have the same address start point and
therefore the same alignment as the store data. 2-34
Assembly/Compiler Coding Rule 19. (H impact, M generality) The
data of a load which is forwarded from a store must be completely
contained within the store data. 2-34
Assembly/Compiler Coding Rule 20. (H impact, ML generality) If it is
necessary to extract a non-aligned portion of stored data, read out the
smallest aligned portion that completely contains the data and shift/mask
the data as necessary. 2-35
Assembly/Compiler Coding Rule 21. (MH impact, ML generality)
Avoid several small loads after large stores to the same area of memory by
using a single large read and register copies as needed. 2-35
Assembly/Compiler Coding Rule 22. (H impact, MH generality)
Where it is possible to do so without incurring other penalties, prioritize
the allocation of variables to registers, as in register allocation and for
parameter passing to minimize the likelihood and impact of store-
forwarding problems. Try not to store-forward data generated from a long
latency instruction, e.g.
mul, div. Avoid store-forwarding data for
variables with the shortest store-load distance. Avoid store-forwarding
data for variables with many and/or long dependence chains, and
especially avoid including a store forward on a loop-carried dependence
chain. 2-38
Assembly/Compiler Coding Rule 23. (H impact, M generality) Try to
arrange data structures such that they permit sequential access. 2-41
Assembly/Compiler Coding Rule 24. (H impact, M generality) If
64-bit data is ever passed as a parameter or allocated on the stack, make
sure that the stack is aligned to an 8-byte boundary. 2-42
Assembly/Compiler Coding Rule 25. (H impact, MH generality) Lay
out data or order computation to avoid having cache lines that have linear
addresses that are a multiple of 64 KB apart in the same working set.
Avoid having more than 4 cache lines that are some multiple of 2 KB apart
in the same first-level cache working set, and avoid having more than
eight cache lines that are some multiple of 4 KB apart in the same