258 Microarchitecture for AMD Athlon™ 64 and AMD Opteron™ Processors Appendix A
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
fld [somefloat] ;Load a floating point
;value from memory into ST(0)
fadd st(0),st(1) ;The data from the load will be
;forwarded directly to this instruction,
;no need to read the register file
A.14 Floating-Point Execution Unit
The floating-point execution unit (FPU) is implemented as a coprocessor having its own out-of-order
control in addition to the data path. The FPU handles all register operations for x87 instructions, all
3DNow! technology operations, all MMX operations, and all SSE and SSE2 operations. The FPU
consists of a stack renaming unit, a register renaming unit, a scheduler, a register file, and three
parallel execution units. Figure 8 shows a block diagram of the dataflow through the FPU.
Figure 8. Floating-Point Unit
As shown in Figure 8, the floating-point logic uses three separate execution positions or pipes. The
first of the three pipes is generally known as the adder pipe (FADD), and it contains an MMX
ALU/shifter and floating-point add execution units. The second pipe is known as the multiplier
(FMUL). It contains the floating-point multiplier/divider/square root unit and also an MMX ALU.
The third pipe is known as the floating-point load/store (FSTORE), which handles floating-point
stores and many micro-op primitives used in VectorPath sequences.
A.15 Load-Store Unit
The load-store unit (LSU) is shown in Figure 9. It manages data load and store accesses to the L1 data
Stack Map
FMUL
• SSE, SSE2, SSE3 ALU and multiplier
• 3DNow! technology ALU and multipler
• MMX ALU and multiplier
Instruction Control Unit
Register Rename
Scheduler (36-entry)
FPU Register File (120-entry)
FADD
• SSE, SSE2, SSE3 ALU
• 3DNow!
TM
technology ALU
• MMX
TM
ALU
FSTORE
• x87 adder • x87 multiplier