326 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
C.8 SSE2 Instructions
Table 19. SSE2 Instructions
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
ADDPD xmmreg1,
xmmreg2
66h 0Fh 58h 11-xxx-xxx Double FADD 5 1/2
ADDPD xmmreg,
mem128
66h 0Fh 58h mm-xxx-xxx Double FADD 7 1/2
ADDSD xmmreg1,
xmmreg2
F2h 0Fh 58h 11-xxx-xxx DirectPath FADD 4 1/1
ADDSD xmmreg,
mem64
F2h 0Fh 58h mm-xxx-xxx DirectPath FADD 6 1/1
ANDNPD xmmreg1,
xmmreg2
66h 0Fh 55h 11-xxx-xxx Double FMUL 3 1/2
ANDNPD xmmreg,
mem128
66h 0Fh 55h mm-xxx-xxx Double FMUL 5 1/2
ANDPD xmmreg1,
xmmreg2
66h 0Fh 54h 11-xxx-xxx Double FMUL 3 1/2
ANDPD xmmreg,
mem128
66h 0Fh 54h mm-xxx-xxx Double FMUL 5 1/2
CMPPD xmmreg1,
xmmreg2, imm8
66h 0Fh C2h 11-xxx-xxx Double FADD 3 1/2
CMPPD xmmreg,
mem128, imm8
66h 0Fh C2h mm-xxx-xxx Double FADD 5 1/2
CMPSD xmmreg1,
xmmreg2, imm8
F2h 0Fh C2h 11-xxx-xxx DirectPath FADD 2 1/1
CMPSD xmmreg,
mem64, imm8
F2h 0Fh C2h mm-xxx-xxx DirectPath FADD 4 1/1
COMISD xmmreg1,
xmmreg2
66h 0Fh 2Fh 11-xxx-xxx VectorPath FADD 4 1
COMISD xmmreg,
mem64
66h 0Fh 2Fh mm-xxx-xxx VectorPath FADD 5 1
CVTDQ2PD xmmreg1,
xmmreg2
F3h 0Fh E6h 11-xxx-xxx Double FSTORE 5 1/2
CVTDQ2PD xmmreg,
mem64
F3h 0Fh E6h mm-xxx-xxx Double FSTORE 7 1/2
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.