144 Scheduling Optimizations Chapter 7
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
7.1 Instruction Scheduling by Latency
Optimization
In general, select instructions with shorter latencies that are DirectPath—not VectorPath—
instructions. For a list of instruction latencies and classifications, see Appendix C, “Instruction
Latencies.”
The AMD Athlon™ 64 and AMD Opteron™ processors can execute up to three AMD64 instructions
per cycle, with each instruction possibly having a different latency. The AMD Athlon 64 and
AMD Opteron processors have flexible scheduling, but for absolute maximum performance, schedule
instructions according to their latencies and data dependencies. The goal is to reduce the overall
length of dependency chains.
Application
This optimization applies to:
• 32-bit software
• 64-bit software