IA-32 Intel® Architecture Optimization
4-42
Packed SSE2 Integer versus MMX Instructions
In general, 128-bit SIMD integer instructions should be favored over
64-bit MMX instructions on Intel Core Solo and Intel Core Duo
processors. This is because:
• Improved decoder bandwidth and more efficient uop flows relative
to the Pentium M processor.
• Wider width of the XMM registers can benefit code that is limited
by either decoder bandwidth or execution latency. XMM registers
can provide twice the space to store data for in-flight execution.
Wider XMM registers can facilitate loop-unrolling or in reducing
loop overhead by halving the number of loop iterations.
Execution throughput of 128-bit SIMD integration operations is
basically the same as 64-bit MMX operations. Some
shuffle/unpack/shift operations do not benefit from the front-end
improvements. The net of using 128-bit SIMD integer instruction on
Intel Core Solo and Intel Core Duo processors is likely to be slightly
positive overall, but there may be a few situations where they will
generate an unfavorable performance impact.