340 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
PUNPCKHQDQ
xmmreg1, xmmreg2
66h 0Fh 6Dh Double FADD/
FMUL
21/1
PUNPCKHQDQ
xmmreg, mem128
66h 0Fh 6Dh Double FADD/
FMUL
41/1
PUNPCKHWD
xmmreg1, xmmreg2
66h 0Fh 69h Double FADD/
FMUL
21/1
PUNPCKHWD xmmreg,
mem128
66h 0Fh 69h Double FADD/
FMUL
41/1
PUNPCKLBW
xmmreg1, xmmreg2
66h 0Fh 60h Double FADD/
FMUL
21/1
PUNPCKLBW xmmreg,
mem128
66h 0Fh 60h Double FADD/
FMUL
41/1
PUNPCKLDQ
xmmreg1, xmmreg2
66h 0Fh 62h Double FADD/
FMUL
21/1
PUNPCKLDQ xmmreg,
mem128
66h 0Fh 62h Double FADD/
FMUL
41/1
PUNPCKLQDQ
xmmreg1, xmmreg2
66h 0Fh 6C DirectPath FADD/
FMUL
22/1
PUNPCKLQDQ
xmmreg, mem128
66h 0Fh 6C DirectPath FADD/
FMUL/
FSTORE
42/1
PUNPCKLWD
xmmreg1, xmmreg2
66h 0Fh 61h Double FADD/
FMUL
21/1
PUNPCKLWD xmmreg,
mem128
66h 0Fh 61h Double FADD/
FMUL
41/1
PXOR xmmreg1,
xmmreg2
66h 0Fh EFh Double FADD/
FMUL
21/1
PXOR xmmreg,
mem128
66h 0Fh EFh Double FADD/
FMUL
41/1
SHUFPD xmmreg1,
xmmreg2, imm8
66h 0Fh C6h VectorPath ~ 4
SHUFPD xmmreg,
mem128, imm8
66h 0Fh C6h VectorPath ~ 6
SQRTPD xmmreg1,
xmmreg2
66h 0Fh 51h Double FMUL 51 1/48
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.