Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

2-72

Floating-Point Stalls

Floating-point instructions have a latency of at least two cycles. But,

because of the out-of-order nature of Pentium II and the subsequent

processors, stalls will not necessarily occur on an instruction or µop

basis. However, if an instruction has a very long latency such as an

fdiv, then scheduling can improve the throughput of the overall

application.

x87 Floating-point Operations with Integer Operands

For Pentium 4 processor, splitting floating-point operations (fiadd,

fisub, fimul, and fidiv) that take 16-bit integer operands into two

instructions (

fild and a floating-point operation) is more efficient.

However, for floating-point operations with 32-bit integer operands,

using

fiadd, fisub, fimul, and fidiv is equally efficient compared

with using separate instructions.

Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to use

32-bit operands rather than 16-bit operands for fild. However, do not do so

at the expense of introducing a store forwarding problem by writing the two

halves of the 32-bit memory operand separately.

x87 Floating-point Comparison Instructions

On Pentium II and the subsequent processors, the fcomi and fcmov

instructions should be used when performing floating-point

comparisons. Using (

fcom, fcomp, fcompp) instructions typically

requires additional instruction like

fstsw. The latter alternative causes

more

μops to be decoded, and should be avoided.

Transcendental Functions

If an application needs to emulate math functions in software due to

performance or other reasons (see the “Guidelines for Optimizing

Floating-point Code” section), it may be worthwhile to inline math

library calls because the

call and the prologue/epilogue involved with

such calls can significantly affect the latency of operations.

previous next