74 Instruction-Decoding Optimizations Chapter 4
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
4.2.2 Load-Execute Floating-Point Instructions with Floating-Point
Operands
Optimization
❖ When performing floating-point computations using floating-point (not integer) source operands,
use load-execute instructions instead of discrete load and execute instructions.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
Using load-execute floating-point instructions that take floating-point operands improves
performance for the following reasons:
• Denser code allows more work to be held in the instruction cache.
• Denser code generates fewer internal macro-ops, allowing the floating-point scheduler to hold
more work, which increases the chances of extracting parallelism from the code.
Example
Avoid code like this, which uses discrete load and execute instructions:
movss xmm0, [float_var1]
movss xmm12, [float_var2]
mulss xmm0, xmm12
Instead, use code like this, which uses a load-execute floating-point instruction:
movss xmm0, [float_var1]
mulss xmm0, [float_var2]
4.2.3 Load-Execute Floating-Point Instructions with Integer Operands
Optimization
❖ Avoid x87 load-execute floating-point instructions that take integer operands (FIADD, FICOM,
FICOMP, FIDIV, FIDIVR, FIMUL, FISUB, and FISUBR). When performing floating-point
computations using integer source operands, use discrete load (FILD) and execute instructions
instead.