IA-32 Intel® Architecture Optimization
C-12
Table C-4 Streaming SIMD Extension Single-precision Floating-point
Instructions
Instruction Latency
1
Throughput Execution Unit
2
CPUID 0F3n 0F2n 0x69n 0F3n 0F2n 0x69n 0F2n
ADDPS xmm, xmm 5 4 4 2 2 2 FP_ADD
ADDSS xmm, xmm 5 4 3 2 2 1 FP_ADD
ANDNPS
3
xmm, xmm 442222MMX_ALU
ANDPS
3
xmm, xmm 442222MMX_ALU
CMPPS xmm, xmm 5 4 4 2 2 2 FP_ADD
CMPSS xmm, xmm 5 4 3 2 2 1 FP_ADD
COMISS xmm, xmm 7 6 1 2 2 1 FP_ADD,FP_
MISC
CVTPI2PS xmm, mm 12 11 3 2 4 1 MMX_ALU,FP_
ADD,MMX_
SHFT
CVTPS2PI mm, xmm 8 7 3 2 2 1 FP_ADD,MMX_
ALU
CVTSI2SS
3
xmm, r32 12 11 4 2 2 2 FP_ADD,MMX_
SHFT,
MMX_MISC
CVTSS2SI r32, xmm 9 8 4 2 2 1 FP_ADD,FP_
MISC
CVTTPS2PI mm, xmm 8 7 3 2 2 1 FP_ADD,MMX_
ALU
CVTTSS2SI r32, xmm 9 8 4 2 2 1 FP_ADD,FP_
MISC
DIVPS xmm, xmm 40 39 18+
17
40 39 36 FP_DIV
DIVSS xmm, xmm 32 23 32 23 FP_DIV
MAXPS xmm, xmm 5 4 2 2 FP_ADD
MAXSS xmm, xmm 5 4 2 2 FP_ADD
MINPS xmm, xmm 5 4 2 2 FP_ADD
MINSS xmm, xmm 5 4 2 2 FP_ADD
MOVAPS xmm, xmm 6 6 1 1 FP_MOVE
MOVHLPS
3
xmm, xmm 66 22 MMX_SHFT
continued