Chapter 4 Instruction-Decoding Optimizations 75
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
The load-execute floating-point instructions that take integer operands are VectorPath instructions and
generate two micro-ops in a cycle, while discrete load and execute intructions enable a third
DirectPath instruction to be decoded in the same cycle. In some situations, these optimizations can
also reduce execution time if FILD can be scheduled several instructions ahead of the arithmetic
instruction in order to cover the FILD latency.
Example
Avoid code such as the following, which uses load-execute floating-point instructions that take
integer operands:
fld QWORD PTR [foo] ; Push foo onto FP stack [ST(0) = foo].
fimul DWORD PTR [bar] ; Multiply bar by ST(0) [ST(0) = bar * foo].
fiadd DWORD PTR [baz] ; Add baz to ST(0) [ST(0) = baz + (bar * foo)].
Instead, use code such as the following, which uses discrete load and execute instructions:
fild DWORD PTR [bar] ; Push bar onto FP stack.
fild DWORD PTR [baz] ; Push baz onto FP stack.
fld QWORD PTR [foo] ; Push foo onto FP stack.
fmulp st(2), st ; Multiply and pop [ST(1) = foo * bar, ST(0) = baz].
faddp st(1), st ; Add and pop [ST(0) = baz + (foo * bar)].