Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
5-2
Use MMX technology instructions and registers or for copying data
that is not used later in SIMD floating-point computations.
Use the reciprocal instructions followed by iteration for increased
accuracy. These instructions yield reduced accuracy but execute
much faster. Note the following:
If reduced accuracy is acceptable, use them with no iteration.
If near full accuracy is needed, use a Newton-Raphson iteration.
If full accuracy is needed, then use divide and square root which
provide more accuracy, but slow down performance.
Planning Considerations
Whether adapting an existing application or creating a new one, using
SIMD floating-point instructions to achieve optimum performance gain
requires programmers to consider several issues. In general, when
choosing candidates for optimization, look for code segments that are
computationally intensive and floating-point intensive. Also consider
efficient use of the cache architecture.
The sections that follow answer the questions that should be raised
before implementation:
Can data layout be arranged to increase control parallelism or cache
utilization?
Which part of the code benefits from SIMD floating-point
instructions?
Is the current algorithm the most appropriate for SIMD
floating-point instructions?
Is the code floating-point intensive?
Do either single-precision floating-point or double-precision
floating- point computations provide enough range and precision?
Does the result of computation affected by enabling flush-to-zero or
denormals-to-zero modes?