Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-88
Using test instruction between the instruction that may modify part of
the flag register and the instruction that uses the flag register can also
help prevent partial flag register stall.
Assembly/Compiler Coding Rule 52. (ML impact, M generality) Use the
test instruction instead of and when the result of the logical and is not used.
This saves uops in execution. Use a test if a register with itself instead of a
cmp of the register to zero, this saves the need to encode the zero and saves
encoding space. Avoid comparing a constant to a memory operand. It is
preferable to load the memory operand and compare the constant to a register.
Often a produced value must be compared with zero, and then used in a
branch. Because most Intel architecture instructions set the condition
codes as part of their execution, the compare instruction may be
eliminated. Thus the operation can be tested directly by a
jcc
instruction. The notable exceptions are
mov and lea. In these cases, use
test.
Assembly/Compiler Coding Rule 53. (ML impact, M generality) Eliminate
unnecessary compare with zero instructions by using the appropriate
conditional jump instruction when the flags are already set by a preceding
arithmetic instruction. If necessary, use a
test instruction instead of a
compare. Be certain that any code transformations made do not introduce
problems with overflow.
Floating Point/SIMD Operands
In initial Pentium 4 processor implementations, the latency of MMX or
SIMD floating point register to register moves is significant. This can
have implications for register allocation.
Moves that write a portion of a register can introduce unwanted
dependences. The
movsd reg, reg instruction writes only the bottom
64 bits of a register, not to all 128 bits. This introduces a dependence on
the preceding instruction that produces the upper 64 bits (even if those
bits are not longer wanted). The dependence inhibits register renaming,
and thereby reduces parallelism.