AMD 250 Computer Hardware User Manual


 
332 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
MOVUPD xmmreg1,
xmmreg2
66h 0Fh 10h Double FADD/
FMUL
2
MOVUPD xmmreg,
mem128
66h 0Fh 10h VectorPath FADD/
FMUL/
FSTORE
7
MOVUPD xmmreg1,
xmmreg2
66h 0Fh 11h Double FADD/
FMUL
2
MOVUPD mem128,
xmmreg
66h 0Fh 11h VectorPath FSTORE 4
MULPD xmmreg1,
xmmreg2
66h 0Fh 59h Double FMUL 5 1/2
MULPD xmmreg,
mem128
66h 0Fh 59h Double FMUL 7 1/2
MULSD xmmreg1,
xmmreg2
F2h 0Fh 59h DirectPath FMUL 4 1/1
MULSD xmmreg,
mem64
F2h 0Fh 59h DirectPath FMUL 6 1/1
ORPD xmmreg1,
xmmreg2
66h 0Fh 56h Double FMUL 3 1/2
ORPD xmmreg,
mem128
66h 0Fh 56h Double FMUL 5 1/2
PACKSSDW xmmreg1,
xmmreg2
66h 0Fh 6Bh VectorPath ~ 4
PACKSSDW xmmreg,
mem128
66h 0Fh 6Bh VectorPath ~ 6
PACKSSWB xmmreg1,
xmmreg2
66h 0Fh 63h VectorPath ~ 4
PACKSSWB xmmreg,
mem128
66h 0Fh 63h VectorPath ~ 6
PACKUSWB xmmreg1,
xmmreg2
66h 0Fh 67h VectorPath ~ 4
PACKUSWB xmmreg,
mem128
66h 0Fh 67h VectorPath ~ 6
PADDB xmmreg1,
xmmreg2
66h 0Fh FCh Double FADD/
FMUL
21/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.