Chapter 4 Instruction-Decoding Optimizations 85
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
4.9 Alternatives to SHLD Instruction
Optimization
Where register pressure is low, replace the SHLD instruction with alternative code using ADD and
ADC, or SHR and LEA.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution
resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA are DirectPath instructions,
while SHLD is a VectorPath instruction. Use of the replacement code optimizes decode bandwidth
because it potentially enables the simultaneous decoding of a third DirectPath instruction. However,
the replacement code may increase register pressure because it destroys the contents of one register
(reg2 in the following examples) whereas the register is preserved by SHLD.
Example 1
Replace this instruction:
shld
reg1
,
reg2
, 1
with this code sequence:
add
reg2
,
reg2
adc
reg1
,
reg1
Example 2
Replace this instruction:
shld
reg1
,
reg2
, 2
with this code sequence:
shr
reg2
, 30
lea
reg1
, [
reg1
*4+
reg2
]