Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-78
Table 2-3 illustrates using movzx to avoid a partial register stall when
packing three byte values into a register.
Assembly/Compiler Coding Rule 44. (ML impact, L generality) Use simple
instructions that are less than eight bytes in length.
Assembly/Compiler Coding Rule 45. (M impact, MH generality) Avoid
using prefixes to change the size of immediate and displacement.
Long instructions (more than seven bytes) limit the number of decoded
instructions per cycle on the Pentium M processor. Each prefix adds one
byte to the length of instruction, possibly limiting the decoder’s
throughput. In addition, multiple prefixes can only be decoded by the
first decoder. These prefixes also incur a delay when decoded. If
multiple prefixes or a prefix that changes the size of an immediate or
displacement cannot be avoided, schedule them behind instructions that
stall the pipe for some other reason.
Assembly/Compiler Coding Rule 46. (M impact, MH generality) Break
dependences on portions of registers between instructions by operating on
32-bit registers instead of partial registers. For moves, this can be
accomplished with 32-bit moves or by using
movzx.
On Pentium M processors, the movsx and movzx instructions both take a
single μop, whether they move from a register or memory. On Pentium
4 processors, the
movsx takes an additional μop. This is likely to cause
Table 2-3 Avoiding Partial Register Stall When Packing Byte Values
A Sequence with Partial Register Stall
Alternate Sequence without
Partial Register Stall
mov al,byte ptr a[2]
shl eax,16
mov ax,word ptr a
movd mm0,eax
movzx eax,byte ptr a[2]
shl eax,16
movzx ecx,word ptr a
or eax,ecx
movd mm0,eax