318 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
COMISS xmmreg,
mem32
0Fh 2Fh mm-xxx-xxx VectorPath 6
CVTPI2PS xmmreg,
mmreg
0Fh 2Ah 11-xxx-xxx DirectPath 4
CVTPI2PS xmmreg,
mem64
0Fh 2Ah mm-xxx-xxx DirectPath 6
CVTPS2PI mmreg,
xmmreg
0Fh 2Dh 11-xxx-xxx DirectPath 4
CVTPS2PI mmreg,
mem128
0Fh 2Dh mm-xxx-xxx DirectPath 6
CVTSI2SS xmmreg,
reg32/64
F3h 0Fh 2Ah 11-xxx-xxx VectorPath 14
CVTSI2SS xmmreg,
mem32/64
F3h 0Fh 2Ah mm-xxx-xxx Double 9
CVTSS2SI reg32,
xmmreg
F3h 0Fh 2Dh 11-xxx-xxx Double 9
CVTSS2SI reg32,
mem32
F3h 0Fh 2Dh mm-xxx-xxx VectorPath 10
CVTTPS2PI mmreg,
xmmreg
0Fh 2Ch 11-xxx-xxx DirectPath 4
CVTTPS2PI mmreg,
mem128
0Fh 2Ch mm-xxx-xxx DirectPath 6
CVTTSS2SI reg32,
xmmreg
F3h 0Fh 2Ch 11-xxx-xxx Double 9
CVTTSS2SI reg32,
mem32
F3h 0Fh 2Ch mm-xxx-xxx VectorPath 10
DIVPS xmmreg1,
xmmreg2
0Fh 5Eh 11-xxx-xxx Double FMUL 33
Table 18. SSE Instructions (Continued)
Syntax
Encoding
Decode
type
FPU pipe(s) Latency Note
Prefix
byte
First
byte
2nd
byte
ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. The second latency value indicates when the low half of the result becomes available.
3. The high half of the result is available one cycle earlier than listed.
4. The latency listed is the absolute minimum, while average latencies may be higher and are a function of internal
pipeline conditions.
5. For the PREFETCHNTA/T0/T1/T2 instructions, the mem8 value refers to an address in the 64-byte line to be
prefetched.
6. The 8-clock latency is only visible to younger stores that need to do an external write. The 2-clock latency is
visible to the other stores and instructions.
7. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.