General Optimization Guidelines 2
2-7
• Avoid longer latency instructions: integer multiplies and divides.
Replace them with alternate code sequences (e.g., use shifts instead
of multiplies).
• Use the lea instruction and the full range of addressing modes to do
address calculation.
• Some types of stores use more µops than others, try to use simpler
store variants and/or reduce the number of stores.
• Avoid use of complex instructions that require more than 4 µops.
• Avoid instructions that unnecessarily introduce dependence-related
stalls:
inc and dec instructions, partial register operations (8/16-bit
operands).
• Avoid use of ah, bh, and other higher 8-bits of the 16-bit registers,
because accessing them requires a shift operation internally.
• Use xor and pxor instructions to clear registers and break
dependencies for integer operations; also use
xorps and xorpd to
clear XMM registers for floating-point operations.
• Use efficient approaches for performing comparisons.
Optimize Instruction Scheduling
• Consider latencies and resource constraints.
• Calculate store addresses as early as possible.
Enable Vectorization
• Use the smallest possible data type. This enables more parallelism
with the use of a longer vector.
• Arrange the nesting of loops so the innermost nesting level is free of
inter-iteration dependencies. It is especially important to avoid the
case where the store of data in an earlier iteration happens lexically
after the load of that data in a future iteration (called
lexically-backward dependence).