Appendix C Instruction Latencies 341
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
SQRTPD xmmreg,
mem128
66h 0Fh 51h Double FMUL 53 1/48
SQRTSD xmmreg1,
xmmreg2
F2h 0Fh 51h DirectPath FMUL 27 1/24
SQRTSD xmmreg,
mem64
F2h 0Fh 51h DirectPath FMUL 29 1/24
SUBPD xmmreg1,
xmmreg2
66h 0Fh 5Ch Double FADD 5 1/2
SUBPD xmmreg,
mem128
66h 0Fh 5Ch Double FADD 7 1/2
SUBSD xmmreg1,
xmmreg2
F2h 0Fh 5Ch DirectPath FADD 4 1/1
SUBSD xmmreg,
mem128
F2h 0Fh 5Ch DirectPath FADD 6 1/1
UCOMISD xmmreg1,
xmmreg2
66h 0Fh 2Eh VectorPath FADD 4 1/1
UCOMISD xmmreg,
mem64
66h 0Fh 2Eh VectorPath FADD 5 1/1
UNPCKHPD xmmreg1,
xmmreg2
66h 0Fh 15h Double FADD/
FMUL
21/1
UNPCKHPD xmmreg,
mem128
66h 0Fh 15h Double FADD/
FMUL/
FSTORE
41/1
UNPCKLPD xmmreg1,
xmmreg2
66h 0Fh 14h DirectPath FADD/
FMUL
22/1
UNPCKLPD xmmreg,
mem128
66h 0Fh 14h DirectPath FADD/
FMUL/
FSTORE
42/1
XORPD xmmreg1,
xmmreg2
66h 0Fh 57h Double FMUL 3 1/2
XORPD xmmreg,
mem128
66h 0Fh 57h Double FMUL 5 1/2
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.