Appendix C Instruction Latencies 337
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
PSHUFHW xmmreg,
mem128, imm8
F3h 0Fh 70h Double FADD/
FMUL
41/1
PSHUFLW xmmreg1,
xmmreg2, imm8
F2h 0Fh 70h Double FADD/
FMUL
21/1
PSHUFLW xmmreg,
mem128, imm8
F2h 0Fh 70h Double FADD/
FMUL
41/1
PSLLD xmmreg1,
xmmreg2
66h 0Fh F2h Double FADD/
FMUL
21/1
PSLLD xmmreg,
mem128
66h 0Fh F2h Double FADD/
FMUL
41/1
PSLLD xmmreg, imm8 66h 0Fh 72h Double FADD/
FMUL
21/1
PSLLDQ xmmreg, imm8 66h 0Fh 73h 11-111-xxx Double FADD/
FMUL
21/1
PSLLQ xmmreg1,
xmmreg2
66h 0Fh F3h Double FADD/
FMUL
21/1
PSLLQ xmmreg,
mem128
66h 0Fh F3h Double FADD/
FMUL
41/1
PSLLQ xmmreg, imm8 66h 0Fh 73h 11-110-xxx Double FADD/
FMUL
21/1
PSLLW xmmreg1,
xmmreg2
66h 0Fh F1h Double FADD/
FMUL
21/1
PSLLW xmmreg,
mem128
66h 0Fh F1h Double FADD/
FMUL
41/1
PSLLW xmmreg, imm8 66h 0Fh 71h 11-110-xxx Double FADD/
FMUL
21/1
PSRAD xmmreg1,
xmmreg2
66h 0Fh E2h Double FADD/
FMUL
21/1
PSRAD xmmreg,
mem128
66h 0Fh E2h Double FADD/
FMUL
41/1
PSRAD xmmreg, imm8 66h 0Fh 72h 11-100-xxx Double FADD/
FMUL
21/1
PSRAW xmmreg1,
xmmreg2
66h 0Fh E1h Double FADD/
FMUL
21/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.