Appendix C Instruction Latencies 287
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
MFENCE 0Fh AEh 11-110-000 VectorPath ~
MOV mreg8, reg8 88h 11-xxx-xxx DirectPath 1
MOV mem8, reg8 88h mm-xxx-xxx DirectPath 3
MOV mreg16/32/64, reg16/32/64 89h 11-xxx-xxx DirectPath 1
MOV mem16/32/64, reg16/32/64 89h mm-xxx-xxx DirectPath 3
MOV reg8, mreg8 8Ah 11-xxx-xxx DirectPath 1
MOV reg8, mem8 8Ah mm-xxx-xxx DirectPath 4
MOV reg16/32/64, mreg16/32/64 8Bh 11-xxx-xxx DirectPath 1
MOV reg16, mem16 8Bh mm-xxx-xxx DirectPath 4
MOV reg32/64, mem32/64 8Bh mm-xxx-xxx DirectPath 3
MOV mreg16/32/64, sreg 8Ch 11-xxx-xxx DirectPath 4/3 7
MOV mem16, sreg 8Ch mm-xxx-xxx Double 4
MOV sreg, mreg16/32/64 8Eh 11-xxx-xxx VectorPath 8
MOV sreg, mem16 8Eh mm-xxx-xxx VectorPath 10
MOV AL, mem8 A0h DirectPath 4
MOV AX/EAX/RAX, mem16/32/64 A1h DirectPath 4/3/3
MOV mem8, AL A2h DirectPath 3
MOV mem16/32/64, AX/EAX/RAX A3h DirectPath 3
MOV AL, imm8 B0h DirectPath 1
MOV CL, imm8 B1h DirectPath 1
MOV DL, imm8 B2h DirectPath 1
MOV BL, imm8 B3h DirectPath 1
MOV AH, imm8 B4h DirectPath 1
MOV CH, imm8 B5h DirectPath 1
Table 13. Integer Instructions (Continued)
Syntax
Encoding
Decode
type
Latency Note
First
byte
Second
byte
ModRM
byte
Notes:
1. Static timing assumes a predicted branch.
2. Store operation also updates ESP—the new register value is available one clock earlier than the specified
latency.
3. The clock count, regardless of the number of shifts or rotates, as determined by CL or imm8.
4. LEA instructions have a latency of 1 when there are two source operands (as in the case of the base + index
form LEA EAX, [EDX+EDI]). Forms with a scale or more than two source operands will have a latency of 2 (LEA
EAX, [EBX+EBX*8]).
5. These instructions have an effective latency as shown. They map to internal NOPs that can be issued at a rate of
three per cycle but do not occupy execution resources.
6. The latency of repeated string instructions can be found in “Latency of Repeated String Instructions” on
page 167.
7. The first latency value is for 32-bit mode. The second is for 64-bit mode.
8. This opcode is used as a REX prefix in 64-bit mode.