304 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
PADDSW mmreg1, mmreg2 0Fh EDh 11-xxx-xxx DirectPath FADD/FMUL 2
PADDSW mmreg, mem64 0Fh EDh mm-xxx-xxx DirectPath FADD/FMUL 4
PADDUSB mmreg1, mmreg2 0Fh DCh 11-xxx-xxx DirectPath FADD/FMUL 2
PADDUSB mmreg, mem64 0Fh DCh mm-xxx-xxx DirectPath FADD/FMUL 4
PADDUSW mmreg1, mmreg2 0Fh DDh 11-xxx-xxx DirectPath FADD/FMUL 2
PADDUSW mmreg, mem64 0Fh DDh mm-xxx-xxx DirectPath FADD/FMUL 4
PADDW mmreg1, mmreg2 0Fh FDh 11-xxx-xxx DirectPath FADD/FMUL 2
PADDW mmreg, mem64 0Fh FDh mm-xxx-xxx DirectPath FADD/FMUL 4
PAND mmreg1, mmreg2 0Fh DBh 11-xxx-xxx DirectPath FADD/FMUL 2
PAND mmreg, mem64 0Fh DBh mm-xxx-xxx DirectPath FADD/FMUL 4
PANDN mmreg1, mmreg2 0Fh DFh 11-xxx-xxx DirectPath FADD/FMUL 2
PANDN mmreg, mem64 0Fh DFh mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPEQB mmreg1, mmreg2 0Fh 74h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPEQB mmreg, mem64 0Fh 74h mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPEQD mmreg1, mmreg2 0Fh 76h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPEQD mmreg, mem64 0Fh 76h mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPEQW mmreg1, mmreg2 0Fh 75h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPEQW mmreg, mem64 0Fh 75h mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPGTB mmreg1, mmreg2 0Fh 64h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPGTB mmreg, mem64 0Fh 64h mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPGTD mmreg1, mmreg2 0Fh 66h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPGTD mmreg, mem64 0Fh 66h mm-xxx-xxx DirectPath FADD/FMUL 4
PCMPGTW mmreg1, mmreg2 0Fh 65h 11-xxx-xxx DirectPath FADD/FMUL 2
PCMPGTW mmreg, mem64 0Fh 65h mm-xxx-xxx DirectPath FADD/FMUL 4
PMADDWD mmreg1, mmreg2 0Fh F5h 11-xxx-xxx DirectPath FMUL 3
PMADDWD mmreg, mem64 0Fh F5h mm-xxx-xxx DirectPath FMUL 5
PMULHW mmreg1, mmreg2 0Fh E5h 11-xxx-xxx DirectPath FMUL 3
PMULHW mmreg, mem64 0Fh E5h mm-xxx-xxx DirectPath FMUL 5
PMULLW mmreg1, mmreg2 0Fh D5h 11-xxx-xxx DirectPath FMUL 3
PMULLW mmreg, mem64 0Fh D5h mm-xxx-xxx DirectPath FMUL 5
Table 14. MMX™ Technology Instructions (Continued)
Syntax
Encoding
Decode
type
FPU pipe(s) Latency Note
Prefix
byte
First
byte
ModRM byte
Notes:
1. Bits 2, 1, and 0 of the ModRM byte select the integer register.
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.