Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

5-2

• Use MMX technology instructions and registers or for copying data

that is not used later in SIMD floating-point computations.

• Use the reciprocal instructions followed by iteration for increased

accuracy. These instructions yield reduced accuracy but execute

much faster. Note the following:

— If reduced accuracy is acceptable, use them with no iteration.

— If near full accuracy is needed, use a Newton-Raphson iteration.

— If full accuracy is needed, then use divide and square root which

provide more accuracy, but slow down performance.

Planning Considerations

Whether adapting an existing application or creating a new one, using

SIMD floating-point instructions to achieve optimum performance gain

requires programmers to consider several issues. In general, when

choosing candidates for optimization, look for code segments that are

computationally intensive and floating-point intensive. Also consider

efficient use of the cache architecture.

The sections that follow answer the questions that should be raised

before implementation:

• Can data layout be arranged to increase control parallelism or cache

utilization?

• Which part of the code benefits from SIMD floating-point

instructions?

• Is the current algorithm the most appropriate for SIMD

floating-point instructions?

• Is the code floating-point intensive?

• Do either single-precision floating-point or double-precision

floating- point computations provide enough range and precision?

• Does the result of computation affected by enabling flush-to-zero or

denormals-to-zero modes?

previous next