Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-63
FPU control word (FCW), such as when performing conversions to
integers. On Pentium M, Intel Core Solo and Intel Core Duo processors;
FLDCW is improved over previous generations.
Specifically, the optimization for
FLDCW allows programmers to
alternate between two constant values efficiently. For the
FLDCW
optimization to be effective, the two constant FCW values are only
allowed to differ on the following 5 bits in the FCW:
FCW[8-9] precision control
FCW[10-11] rounding control
FCW[12] infinity control
If programmers need to modify other bits (for example: mask bits) in the
FCW, the
FLDCW instruction is still an expensive operation.
In situations where an application cycles between three (or more)
constant values,
FLDCW optimization does not apply and the performance
degradation occurs for each
FLDCW instruction.
One solution to this problem is to choose two constant FCW values,
take advantage of the optimization of the
FLDCW instruction to alternate
between only these two constant FCW values, and devise some means
to accomplish the task that requires the 3rd FCW value without actually
changing the FCW to a third constant value. An alternative solution is to
structure the code so that, for periods of time, the application alternates
between only two constant FCW values. When the application later
alternates between a pair of different FCW values, the performance
degradation occurs only during the transition.
It is expected that SIMD applications are unlikely to alternate FTZ and
DAZ mode values. Consequently, the SIMD control word does not have
the short latencies that the floating-point control register does. A read of
the
MXCSR register has a fairly long latency, and a write to the register is
a serializing instruction.
There is no separate control word for single and double precision; both
use the same modes. Notably, this applies to both FTZ and DAZ modes.