AMD 250 Computer Hardware User Manual


 
Appendix C Instruction Latencies 331
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
MOVHPD mem64,
xmmreg
66h 0Fh 17h mm-xxx-xxx DirectPath FSTORE 2
MOVLPD xmmreg,
mem64
66h 0Fh 12h mm-xxx-xxx DirectPath FADD/
FMUL/
FSTORE
2
MOVLPD mem64,
xmmreg
66h 0Fh 13h mm-xxx-xxx DirectPath FSTORE 2
MOVMSKPD reg32/64,
xmmreg
66h 0Fh 50h 11-xxx-xxx VectorPath FADD 3 1/1
MOVNTDQ mem128,
xmmreg
66h 0Fh E7h mm-xxx-xxx Double FSTORE 3 2
MOVNTI mem32/64,
reg32/64
0Fh C3h mm-xxx-xxx DirectPath FSTORE ~
MOVNTPD mem128,
xmmreg
66h 0Fh 2Bh mm-xxx-xxx Double FSTORE 3 2
MOVQ xmmreg1,
xmmreg2
F3h 0Fh 7Eh 11-xxx-xxx Double FADD/
FMUL
2
MOVQ xmmreg, mem64 F3h 0Fh 7Eh mm-xxx-xxx Double FADD/
FMUL/
FSTORE
4
MOVQ xmmreg1,
xmmreg2
66h 0Fh D6h 11-xxx-xxx Double FADD/
FMUL
2
MOVQ mem64, xmmreg 66h 0Fh D6h mm-xxx-xxx DirectPath FSTORE 4
MOVQ2DQ xmmreg,
mmreg
F3h 0Fh D6h 11-xxx-xxx Double FADD/
FMUL
2
MOVSD xmmreg1,
xmmreg2
F2h 0Fh 10h 11-xxx-xxx DirectPath FADD/
FMUL
2
MOVSD xmmreg,
mem64
F2h 0Fh 10h mm-xxx-xxx Double FADD/
FMUL/
FSTORE
2
MOVSD xmmreg1,
xmmreg2
F2h 0Fh 11h 11-xxx-xxx DirectPath FADD/
FMUL
2
MOVSD mem64,
xmmreg
F2h 0Fh 11h mm-xxx-xxx DirectPath FSTORE 2
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.