Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
2-64
Assembly/Compiler Coding Rule 31. (H impact, M generality) Minimize
changes to bits 8-12 of the floating point control word. Changes for more than
two values (each value being a combination of the following bits: precision,
rounding and infinity control, and the rest of bits in FCW) leads to delays that
are on the order of the pipeline depth.
Rounding Mode
Many libraries provide the float-to-integer library routines that convert
floating-point values to integer. Many of these libraries conform to
ANSI C coding standards which state that the rounding mode should be
truncation. With the Pentium 4 processor, one can use the
cvttsd2si
and
cvttss2si instructions to convert operands with truncation and
without ever needing to change rounding modes. The cost savings of
using these instructions over the methods below is enough to justify
using Streaming SIMD Extensions and Streaming SIMD Extensions 2
wherever possible when truncation is involved.
For x87 floating point, the
fist instruction uses the rounding mode
represented in the floating-point control word (FCW). The rounding
mode is generally round to nearest, therefore many compiler writers
implement a change in the rounding mode in the processor in order to
conform to the C and FORTRAN standards. This implementation
requires changing the control word on the processor using the
fldcw
instruction. For a change in the rounding, precision, and infinity bits;
use the
fstcw instruction to store the floating-point control word. Then
use the
fldcw instruction to change the rounding mode to truncation.
In a typical code sequence that changes the rounding mode in the FCW,
a
fstcw instruction is usually followed by a load operation. The load
operation from memory should be a 16-bit operand to prevent store-
forwarding problem. If the load operation on the previously-stored
FCW word involves either an 8-bit or a 32-bit operand, this will cause a
store-forwarding problem due to mismatch of the size of the data
between the store operation and the load operation.
Make sure that the write and read to the FCW are both 16-bit operations,
to avoid store-forwarding problems.