Optimizing for SIMD Floating-point Applications 5
5-3
• Is the data arranged for efficient utilization of the SIMD
floating-point registers?
• Is this application targeted for processors without SIMD
floating-point instructions?
For more details, see the section on “Considerations for Code
Conversion to SIMD Programming” in Chapter 3.
Using SIMD Floating-point with x87 Floating-point
Because the XMM registers used for SIMD floating-point computations
are separate registers and are not mapped onto the existing x87
floating-point stack, SIMD floating-point code can be mixed with either
x87 floating-point or 64-bit SIMD integer code.
Scalar Floating-point Code
There are SIMD floating-point instructions that operate only on the
least-significant operand in the SIMD register. These instructions are
known as scalar instructions. They allow the XMM registers to be used
for general-purpose floating-point computations.
In terms of performance, scalar floating-point code can be equivalent to
or exceed x87 floating-point code, and has the following advantages:
• SIMD floating-point code uses a flat register model, whereas x87
floating-point code uses a stack model. Using scalar floating-point
code eliminates the need to use
fxch instructions, which has some
performance limit on the Intel Pentium 4 processor.
• Mixing with MMX technology code without penalty.
• Flush-to-zero mode.
• Shorter latencies than x87 floating-point.