270 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
C.1 Understanding Instruction Entries
To use the information in this appendix effectively, you need to understand how the entry for an
instruction is organized and how to interpret certain items.
Example: Instruction Entry
The entry for an instruction begins with its syntax. Subsequent columns provide additional
information about the instruction.
Parts of the Instruction Entry
This table describes the columns that are common to each instruction entry in this appendix.
The entries for floating-point, MMX, SSE, and SSE2, and 3DNow!™ instructions have an additional
column [FPU Pipe(s)] that lists the possible floating-point unit (FPU) pipelines available for use by
any particular DirectPath or Double decoded operation. For example, the floating point multiplier is
represented by FMUL.
Syntax
Encoding
Decode
type
Latency Note
First
byte
Second
byte
ModRM
byte
ADD mreg8, reg8 00h 11-xxx-xxx DirectPath 1
Column Description
Syntax Shows the syntax for the instruction—the permitted arrangement of its parts. Items in
italics are placeholders for operands that you must provide. For information on how to
interpret the placeholders, see “Interpreting Placeholders” on page 271
Encoding Shows how the assembler translates the instruction into machine language.
Subcolumns show the individual bytes of the encoding.
Decode type Shows the method that the processor uses to decode the instruction—either DirectPath
Single (DirectPath), DirectPath Double (Double), or VectorPath.
Latency Shows the static execution latency for the instruction. For details on how to interpret the
latency information, see “Interpreting Latencies” on page 272.
Throughput This value indicates the maximum theoretical rate of execution of that instruction. For
example, a value of 1/2 means that one such instruction executes every two clocks, or
two such instructions in four clocks and so on. A value of 3/1 indicates that three such
instructions can be executed every clock, but fewer than three such instructions would
still take one clock.