Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Appendix C Instruction Latencies 311

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

FMUL [mem32real] D8h mm-001-xxx DirectPath FMUL 6

FMUL [mem64real] DCh mm-001-xxx DirectPath FMUL 6

FMULP ST(i), ST DEh 11-001-xxx DirectPath FMUL 4 1

FNCLEX DBh E2h VectorPath 16

FNINIT DBh E3h VectorPath 89

FNOP D9h 11-010-000 DirectPath FADD/FMUL/

FSTORE

22

FPATAN D9h 11-110-011 VectorPath - 136

FPREM D9h 11-111-000 DirectPath FMUL 9+e+n 4

FPREM1 D9h 11-110-101 DirectPath FMUL 9+e+n 4

FPTAN D9h 11-110-010 VectorPath - 107

FRNDINT D9h 11-111-100 VectorPath - 10

FRSTOR [mem94byte] DDh mm-100-xxx VectorPath - 138

FRSTOR [mem108byte] DDh mm-100-xxx VectorPath - 138

FSAVE [mem94byte] DDh mm-110-xxx VectorPath - 159

FSAVE [mem108byte] DDh mm-110-xxx VectorPath - 159

FSCALE D9h 11-111-101 VectorPath - 9

FSIN D9h 11-111-110 VectorPath - 93

FSINCOS D9h 11-111-011 VectorPath - 104

FSQRT D9h 11-111-010 DirectPath FMUL 35

FST [mem32real] D9h mm-010-xxx DirectPath FSTORE 2

FST [mem64real] DDh mm-010-xxx DirectPath FSTORE 2

FST ST(i) DDh 11-010xxx DirectPath FADD/FMUL 2

FSTCW [mem16] D9h mm-111-xxx VectorPath - 4

FSTENV [mem14byte] D9h mm-110-xxx VectorPath - 89

Table 15. x87 Floating-Point Instructions (Continued)

Syntax

Encoding

Decode

type

FPU

pipe(s)

Latency Note

First

byte

Second

byte

ModRM byte

Notes:

1. The last three bits of the ModRM byte select the stack entry ST(i).

2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP

with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of

three per cycle and can use any of the three execution resources.

3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).

4. There is additional latency associated with this instruction. “e” represents the difference between the exponents

of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then

n = (s+1)/2 where (0 <= n <= 32).

5. The latency provided for this operation is the best-case latency.

6. The three latency numbers represent the latency values for precision control settings of single precision, double

precision, and extended precision, respectively.

previous next