AMD 250 Computer Hardware User Manual


 
338 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
PSRAW xmmreg,
mem128
66h 0Fh E1h Double FADD/
FMUL
41/1
PSRAW xmmreg, imm8 66h 0Fh 71h 11-100-xxx Double FADD/
FMUL
21/1
PSRLD xmmreg1,
xmmreg2
66h 0Fh D2h Double FADD/
FMUL
21/1
PSRLD xmmreg,
mem128
66h 0Fh D2h Double FADD/
FMUL
41/1
PSRLD xmmreg, imm8 66h 0Fh 72h 11-010-xxx Double FADD/
FMUL
21/1
PSRLDQ xmmreg,
imm8
66h 0Fh 73h 11-011-xxx Double FADD/
FMUL
21/1
PSRLQ xmmreg1,
xmmreg2
66h 0Fh D3h Double FADD/
FMUL
21/1
PSRLQ xmmreg,
mem128
66h 0Fh D3h Double FADD/
FMUL
41/1
PSRLQ xmmreg, imm8 66h 0Fh 73h 11-010-xxx Double FADD/
FMUL
21/1
PSRLW xmmreg1,
xmmreg2
66h 0Fh D1h Double FADD/
FMUL
21/1
PSRLW xmmreg,
mem128
66h 0Fh D1h Double FADD/
FMUL
41/1
PSRLW xmmreg, imm8 66h 0Fh 71h 11-010-xxx Double FADD/
FMUL
21/1
PSUBB xmmreg1,
xmmreg2
66h 0Fh F8h Double FADD/
FMUL
21/1
PSUBB xmmreg,
mem128
66h 0Fh F8h Double FADD/
FMUL
41/1
PSUBD xmmreg1,
xmmreg2
66h 0Fh FAh Double FADD/
FMUL
21/1
PSUBD xmmreg,
mem128
66h 0Fh FAh Double FADD/
FMUL
41/1
PSUBQ mmreg1,
mmreg2
0Fh FBh DirectPath FADD/
FMUL
21/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.