Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-69
This in turn allows instructions to be reordered to make instructions
available to be executed in parallel. Out-of-order execution precludes
the need for using
fxch to move instructions for very short distances.
x87 vs. Scalar SIMD Floating-point Trade-offs
There are a number of differences between x87 floating-point code and
scalar floating-point code (using SSE and SSE2). The following
differences drive decisions about which registers and instructions to use:
When an input operand for a SIMD floating-point instruction
contains values that are less than the representable range of the data
type, a denormal exception occurs. This causes significant
performance penalty. SIMD floating-point operation has a
flush-to-zero mode. In flush-to-zero mode, the results will not
underflow. Therefore subsequent computation will not face the
performance penalty of handling denormal input operands. For
example, in the case of 3D applications with low lighting levels,
using flush-to-zero mode can improve performance by as much as
50% for applications with large numbers underflows.
Scalar floating point SIMD instructions have lower latencies. This
generally does not matter much as long as resource utilization is
low.
Only x87 supports transcendental instructions.
x87 supports 80-bit precision, double extended floating point.
Streaming SIMD Extensions support a maximum of 32-bit
precision, and Streaming SIMD Extensions 2 supports a maximum
of 64-bit precision.
On the Pentium 4 processor, floating point adds are pipelined for
x87 but not for scalar floating-point code. Floating point multiplies
are not pipelined for either case. For applications with a large
number of floating-point
adds relative to the number of
multiplies, x87 may be a better choice.