Chapter 10 x87 Floating-Point Optimizations 237
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Chapter 10 x87 Floating-Point Optimizations
AMD Athlon™ 64 and AMD Opteron™ processors support multiple methods of performing
floating-point operations. They support the older x87 assembly instructions in addition to the more
recent SIMD instructions (SSE, SSE2, and 3DNow!™ technologies). Many of the suggestions in this
chapter are also generally applicable to the AMD Athlon 64 and AMD Opteron processors, with the
exception of SSE2 optimizations and expanded register usage.
AMD Athlon 64 and AMD Opteron processors are 64-bit processors that are fully backwards
compatible with 32-bit code. In general, 64-bit operating systems support the x87 and 3DNow!
instructions in 32-bit threads; however, 64-bit operating systems may not support x87 and 3DNow!
instructions in 64-bit threads. To make it easier to later migrate from 32-bit to 64-bit code, you may
want to avoid x87 and 3DNow! instructions altogether and use only SSE and SSE2 instructions when
writing new 32-bit code.
This chapter details the methods used to optimize floating-point code to the pipelined x87 floating-
point registers.
This chapter covers the following topics:
Topic Page
Using Multiplication Rather Than Division 238
Achieving Two Floating-Point Operations per Clock Cycle 239
Floating-Point Compare Instructions 244
Using the FXCH Instruction Rather Than FST/FLD Pairs 245
Floating-Point Subexpression Elimination 246
Accumulating Precision-Sensitive Quantities in x87 Registers 247
Avoiding Extended-Precision Data 248