AMD 250 Computer Hardware User Manual


 
310 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
FISTP [mem64int] DFh mm-111-xxx DirectPath FSTORE 4
FISTTP [mem16int] DFh mm-010-xxx DirectPath FSTORE 4
FISTTP [mem32int] DBh mm-010-xxx DirectPath FSTORE 4
FISTTP [mem64int] DDh mm-010-xxx DirectPath FSTORE 4
FISUB [mem32int] DAh mm-100-xxx Double - 11
FISUB [mem16int] DEh mm-100-xxx Double - 11
FISUBR [mem32int] DAh mm-101-xxx Double - 11
FISUBR [mem16int] DEh mm-101-xxx Double - 11
FLD ST(i) D9h 11-000-xxx DirectPath FADD/FMUL 2 1
FLD [mem32real] D9h mm-000-xxx DirectPath FADD/FMUL/
FSTORE
4
FLD [mem64real] DDh mm-000-xxx DirectPath FADD/FMUL/
FSTORE
4
FLD [mem80real] DBh mm-101-xxx VectorPath - 13
FLD1 D9h 11-101-000 DirectPath FSTORE 4
FLDCW [mem16] D9h mm-101-xxx VectorPath - 11
FLDENV [mem14byte] D9h mm-100-xxx VectorPath - 129
FLDENV [mem28byte] D9h mm-100-xxx VectorPath - 129
FLDL2E D9h 11-101-010 DirectPath FSTORE 4
FLDL2T D9h 11-101-001 DirectPath FSTORE 4
FLDLG2 D9h 11-101-100 DirectPath FSTORE 4
FLDLN2 D9h 11-101-101 DirectPath FSTORE 4
FLDPI D9h 11-101-011 DirectPath FSTORE 4
FLDZ D9h 11-101-110 DirectPath FSTORE 4
FMUL ST, ST(i) D8h 11-001-xxx DirectPath FMUL 4 1
FMUL ST(i), ST DCh 11-001-xxx DirectPath FMUL 4 1
Table 15. x87 Floating-Point Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. “e” represents the difference between the exponents
of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double
precision, and extended precision, respectively.