Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-75
Use of the inc and dec Instructions
The inc and dec instructions modify only a subset of the bits in the flag
register. This creates a dependence on all previous writes of the flag
register. This is especially problematic when these instructions are on
the critical path because they are used to change an address for a load on
which many other instructions depend.
Assembly/Compiler Coding Rule 42. (M impact, H generality) inc and
dec instructions should be replaced with an add or sub instruction, because
add and sub overwrite all flags, whereas inc and dec do not, therefore
creating false dependencies on earlier instructions that set the flags.
Use of the shift and rotate Instructions
The shift and rotate instructions have a longer latency on Pentium 4
processor with CPUID signature corresponding to family 15 and model
encoding of 0, 1 or 2. The latency of a sequence of
adds will be shorter
for left shifts of three or less. Fixed and variable shifts have the same
latency.
The
rotate by immediate and rotate by register instructions are more
expensive than a
shift. The rotate by 1 instruction has the same
latency as a
shift.
Assembly/Compiler Coding Rule 43. (ML impact, L generality) Avoid
rotate by register or rotate by immediate instructions. If possible, replace
with a
rotate by 1 instruction.
Flag Register Accesses
A ‘partial flag register stall’ happens when an instruction modifies a part
of the flag register and the following instruction is dependent on the
outcome of the flags. This happens most often with shift instructions
(sar, sal, shr, shl). Although the flags are not modified in the case of zero
shift count, but the shift count is usually known only at execution time.
Therefore, the front-end stalls until the instruction is retired. Other
instructions that can modify some part of the flag register include