IA-32 Intel® Architecture Optimization
5-2
• Use MMX technology instructions and registers or for copying data
that is not used later in SIMD floating-point computations.
• Use the reciprocal instructions followed by iteration for increased
accuracy. These instructions yield reduced accuracy but execute
much faster. Note the following:
— If reduced accuracy is acceptable, use them with no iteration.
— If near full accuracy is needed, use a Newton-Raphson iteration.
— If full accuracy is needed, then use divide and square root which
provide more accuracy, but slow down performance.
Planning Considerations
Whether adapting an existing application or creating a new one, using
SIMD floating-point instructions to achieve optimum performance gain
requires programmers to consider several issues. In general, when
choosing candidates for optimization, look for code segments that are
computationally intensive and floating-point intensive. Also consider
efficient use of the cache architecture.
The sections that follow answer the questions that should be raised
before implementation:
• Can data layout be arranged to increase control parallelism or cache
utilization?
• Which part of the code benefits from SIMD floating-point
instructions?
• Is the current algorithm the most appropriate for SIMD
floating-point instructions?
• Is the code floating-point intensive?
• Do either single-precision floating-point or double-precision
floating- point computations provide enough range and precision?
• Does the result of computation affected by enabling flush-to-zero or
denormals-to-zero modes?