Chapter 4 Instruction-Decoding Optimizations 73
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
4.2 Load-Execute Instructions
A load-execute instruction is an instruction that loads a value from memory into a register and then
performs an operation on that value. Many general purpose instructions, such as ADD, SUB, AND,
etc., have load-execute forms:
add rax, QWORD PTR [foo]
This instruction loads the value foo from memory and then adds it to the value in the RAX register.
The work performed by a load-execute instruction can also be accomplished by using two discrete
instructions—a load instruction followed by an execute instruction. The following example employs
discrete load and execute stages:
mov rbx, QWORD PTR [foo]
add rax, rbx
The first statement loads the value foo from memory into the RBX register. The second statement
adds the value in RBX to the value in RAX.
The following optimizations govern the use of load-execute instructions:
• Load-Execute Integer Instructions on page 73.
• Load-Execute Floating-Point Instructions with Floating-Point Operands on page 74.
• Load-Execute Floating-Point Instructions with Integer Operands on page 74.
4.2.1 Load-Execute Integer Instructions
Optimization
❖ When performing integer computations, use load-execute instructions instead of discrete load
and execute instructions. Use discrete load and execute instructions only to avoid scheduler stalls for
longer executing instructions and to explicitly schedule load and execute operations.
Application
This optimization applies to:
• 32-bit software
• 64-bit software
Rationale
Most load-execute integer instructions are DirectPath decodable and can be decoded at the rate of
three per cycle. Splitting a load-execute integer instruction into two separate instructions reduces
decoding bandwidth and increases register pressure, which results in lower performance.