AMD 250 Computer Hardware User Manual


 
Appendix C Instruction Latencies 299
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
SLDT mreg16/32/64 0Fh 00h 11-000-xxx VectorPath 5
SLDT mem16/32/64 0Fh 00h mm-000-xxx VectorPath 5
SMSW mreg16/32/64 0Fh 01h 11-100-xxx VectorPath 4
SMSW mem16 0Fh 01h mm-100-xxx VectorPath 3
STC F9h DirectPath 1
STD FDh Double 2
STI FBh VectorPath 4
STOSB/STOS mem8 AAh VectorPath 4 6
STOSW/STOS mem16 ABh VectorPath 4 6
STOSD/STOS mem32 ABh VectorPath 4 6
STOSQ/STOS mem64 ABh VectorPath 4 6
STR mreg16/32/64 0Fh 00h 11-001-xxx VectorPath 5
STR mem16 0Fh 00h mm-001-xxx VectorPath 5
SUB mreg8, reg8 28h 11-xxx-xxx DirectPath 1
SUB mem8, reg8 28h mm-xxx-xxx DirectPath 4
SUB mreg16/32/64, reg16/32/64 29h 11-xxx-xxx DirectPath 1
SUB mem16/32/64, reg16/32/64 29h mm-xxx-xxx DirectPath 4
SUB reg8, mreg8 2Ah 11-xxx-xxx DirectPath 1
SUB reg8, mem8 2Ah mm-xxx-xxx DirectPath 4
SUB reg16/32/64, mreg16/32/64 2Bh 11-xxx-xxx DirectPath 1
SUB reg16/32/64, mem16/32/64 2Bh mm-xxx-xxx DirectPath 4
SUB AL, imm8 2Ch DirectPath 1
SUB AX, imm16 2Dh DirectPath 1
SUB EAX, imm32 2Dh DirectPath 1
Table 13. Integer Instructions (Continued)
Syntax
Encoding
Decode
type
Latency Note
First
byte
Second
byte
ModRM
byte
Notes:
1. Static timing assumes a predicted branch.
2. Store operation also updates ESP—the new register value is available one clock earlier than the specified
latency.
3. The clock count, regardless of the number of shifts or rotates, as determined by CL or imm8.
4. LEA instructions have a latency of 1 when there are two source operands (as in the case of the base + index
form LEA EAX, [EDX+EDI]). Forms with a scale or more than two source operands will have a latency of 2 (LEA
EAX, [EBX+EBX*8]).
5. These instructions have an effective latency as shown. They map to internal NOPs that can be issued at a rate of
three per cycle but do not occupy execution resources.
6. The latency of repeated string instructions can be found in “Latency of Repeated String Instructions” on
page 167.
7. The first latency value is for 32-bit mode. The second is for 64-bit mode.
8. This opcode is used as a REX prefix in 64-bit mode.