AMD 250 Computer Hardware User Manual


 
Appendix C Instruction Latencies 339
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
PSUBQ mmreg, mem64 0Fh FBh DirectPath FADD/
FMUL
51/1
PSUBQ xmmreg1,
xmmreg2
66h 0Fh FBh Double FADD/
FMUL
21/1
PSUBQ xmmreg,
mem128
66h 0Fh FBh Double FADD/
FMUL
41/1
PSUBSB xmmreg1,
xmmreg2
66h 0Fh E8h Double FADD/
FMUL
21/1
PSUBSB xmmreg,
mem128
66h 0Fh E8h Double FADD/
FMUL
41/1
PSUBSW xmmreg1,
xmmreg2
66h 0Fh E9h Double FADD/
FMUL
21/1
PSUBSW xmmreg,
mem128
66h 0Fh E9h Double FADD/
FMUL
41/1
PSUBUSB xmmreg1,
xmmreg2
66h 0Fh D8h Double FADD/
FMUL
21/1
PSUBUSB xmmreg,
mem128
66h 0Fh D8h Double FADD/
FMUL
41/1
PSUBUSW xmmreg1,
xmmreg2
66h 0Fh D9h Double FADD/
FMUL
21/1
PSUBUSW xmmreg,
mem128
66h 0Fh D9h Double FADD/
FMUL
41/1
PSUBW xmmreg1,
xmmreg2
66h 0Fh F9h Double FADD/
FMUL
21/1
PSUBW xmmreg,
mem128
66h 0Fh F9h Double FADD/
FMUL
41/1
PUNPCKHBW
xmmreg1, xmmreg2
66h 0Fh 68h Double FADD/
FMUL
21/1
PUNPCKHBW xmmreg,
mem128
66h 0Fh 68h Double FADD/
FMUL
41/1
PUNPCKHDQ
xmmreg1, xmmreg2
66h 0Fh 6Ah Double FADD/
FMUL
21/1
PUNPCKHDQ xmmreg,
mem128
66h 0Fh 6Ah Double FADD/
FMUL
41/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.