IA-32 Intel® Architecture Optimization
2-106
instead of a cmp of the register to zero, this saves the need to encode the
zero and saves encoding space. Avoid comparing a constant to a memory
operand. It is preferable to load the memory operand and compare the
constant to a register. 2-79
Assembly/Compiler Coding Rule 51. (ML impact, M generality)
Eliminate unnecessary compare with zero instructions by using the
appropriate conditional jump instruction when the flags are already set by
a preceding arithmetic instruction. If necessary, use a
test instruction
instead of a compare. Be certain that any code transformations made do
not introduce problems with overflow. 2-79
Assembly/Compiler Coding Rule 52. (M impact, ML generality)
Avoid introducing dependences with partial floating point register writes,
e.g. from the
movsd xmmreg1, xmmreg2 instruction. Use the movapd
xmmreg1, xmmreg2
instruction instead. 2-80
Assembly/Compiler Coding Rule 53. (ML impact, L generality)
Instead of using
movupd xmmreg1, mem for a unaligned 128-bit load,
use movsd xmmreg1, mem; movsd xmmreg2, mem+8; unpcklpd
xmmreg1, xmmreg2
. If the additional register is not available, then use
movsd xmmreg1, mem; movhpd xmmreg1, mem+8. 2-80
Assembly/Compiler Coding Rule 54. (M impact, ML generality)
Instead of using
movupd mem, xmmreg1 for a store, use movsd mem,
xmmreg1; unpckhpd xmmreg1, xmmreg1; movsd mem+8,
xmmreg1
instead. 2-80
Assembly/Compiler Coding Rule 55. (M impact, MH generality) In
routines that do not need a frame pointer and that do not have called
routines that modify
ESP, use ESP as the base register to free up EBP. This
optimization does not apply in the following cases: a routine is called that
leaves
ESP modified upon return, for example, alloca; routines that rely
on
EBP for structured or C++ style exception handling; routines that use
setjmp and longjmp; routines that use EBP to align the local stack on
an 8- or 16-byte boundary; and routines that rely on EBP debugging. 2-81