Appendix C Instruction Latencies 329
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
DIVPD xmmreg1,
xmmreg2
66h 0Fh 5Eh 11-xxx-xxx Double FMUL 37 1/34
DIVPD xmmreg,
mem128
66h 0Fh 5Eh mm-xxx-xxx Double FMUL 39 1/34
DIVSD xmmreg1,
xmmreg2
F2h 0Fh 5Eh 11-xxx-xxx DirectPath FMUL 20 1/17
DIVSD xmmreg,
mem64
F2h 0Fh 5Eh mm-xxx-xxx DirectPath FMUL 22 1/17
MASKMOVDQU
xmmreg1, xmmreg2
66h 0Fh F7h 11-xxx-xxx VectorPath ~ 43
MAXPD xmmreg1,
xmmreg2
66h 0Fh 5Fh 11-xxx-xxx Double FADD 3 1/2
MAXPD xmmreg,
mem128
66h 0Fh 5Fh mm-xxx-xxx Double FADD 5 1/2
MAXSD xmmreg1,
xmmreg2
F2h 0Fh 5Fh 11-xxx-xxx DirectPath FADD 2 1/1
MAXSD xmmreg,
mem64
F2h 0Fh 5Fh mm-xxx-xxx DirectPath FADD 4 1/1
MINPD xmmreg1,
xmmreg2
66h 0Fh 5Dh 11-xxx-xxx Double FADD 3 1/2
MINPD xmmreg,
mem128
66h 0Fh 5Dh mm-xxx-xxx Double FADD 5 1/2
MINSD xmmreg1,
xmmreg2
F2h 0Fh 5Dh 11-xxx-xxx DirectPath FADD 2 1/1
MINSD xmmreg,
mem64
F2h 0Fh 5Dh mm-xxx-xxx DirectPath FADD 4 1/1
MOVAPD xmmreg1,
xmmreg2
66h 0Fh 28h 11-xxx-xxx Double FADD/
FMUL
2
MOVAPD xmmreg,
mem128
66h 0Fh 28h mm-xxx-xxx Double FADD/
FMUL/
FSTORE
2
MOVAPD xmmreg1,
xmmreg2
66h 0Fh 29h 11-xxx-xxx Double FADD/
FMUL
2
MOVAPD mem128,
xmmreg
66h 0Fh 29h mm-xxx-xxx Double FSTORE 3
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.