AMD 250 Computer Hardware User Manual


 
Appendix C Instruction Latencies 333
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
PADDB xmmreg,
mem128
66h 0Fh FCh Double FADD/
FMUL
41/1
PADDD xmmreg1,
xmmreg2
66h 0Fh FEh Double FADD/
FMUL
21/1
PADDD xmmreg,
mem128
66h 0Fh FEh Double FADD/
FMUL
41/1
PADDQ mmreg1,
mmreg2
0Fh D4h DirectPath FADD/
FMUL
2 1/1
PADDQ mmreg, mem64 0Fh D4h DirectPath FADD/
FMUL
4 1/1
PADDQ xmmreg1,
xmmreg2
66h 0Fh D4h Double FADD/
FMUL
21/1
PADDQ xmmreg,
mem128
66h 0Fh D4h Double FADD/
FMUL
41/1
PADDSB xmmreg1,
xmmreg2
66h 0Fh ECh Double FADD/
FMUL
21/1
PADDSB xmmreg,
mem128
66h 0Fh ECh Double FADD/
FMUL
41/1
PADDSW xmmreg1,
xmmreg2
66h 0Fh EDh Double FADD/
FMUL
21/1
PADDSW xmmreg,
mem128
66h 0Fh EDh Double FADD/
FMUL
41/1
PADDUSB xmmreg1,
xmmreg2
66h 0Fh DCh Double FADD/
FMUL
21/1
PADDUSB xmmreg,
mem128
66h 0Fh DCh Double FADD/
FMUL
41/1
PADDUSW xmmreg1,
xmmreg2
66h 0Fh DDh Double FADD/
FMUL
21/1
PADDUSW xmmreg,
mem128
66h 0Fh DDh Double FADD/
FMUL
41/1
PADDW xmmreg1,
xmmreg2
66h 0Fh FDh Double FADD/
FMUL
21/1
PADDW xmmreg,
mem128
66h 0Fh FDh Double FADD/
FMUL
41/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.