5-1
5
Optimizing for SIMD
Floating-point Applications
This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.
General Rules for SIMD Floating-point Code
The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:
• Follow all guidelines in Chapter 2 and Chapter 3.
• Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.
• Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.
• Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, “Optimizing Cache Usage”).
• Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.