Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Intel® Architecture Optimization

C-4

Definitions

The IA-32 instruction performance data are listed in several tables. The

tables contain the following information:

Instruction Name:The assembly mnemonic of each instruction.

Latency: The number of clock cycles that are required for the

execution core to complete the execution of all of the

μops that form a IA-32 instruction.

Throughput: The number of clock cycles required to wait before the

issue ports are free to accept the same instruction

again. For many IA-32 instructions, the throughput of

an instruction can be significantly less than its latency.

Execution units: The names of the execution units in the execution core

that are utilized to execute the μops for each

instruction. This information is provided only for

IA-32 instructions that are decoded into no more than

4 μops. μops for instructions that decode into more

than 4 μops are supplied by microcode ROM. Note

that several execution units may share the same port,

such as

FP_ADD, FP_MUL, or MMX_SHFT in the

FP_EXECUTE cluster (see Figure 1-4, Figure 1-4 applies

to Pentium 4 and Intel Xeon processors with CPUID

signature of family 15, model encoding = 0, 1, 2).

Latency and Throughput

This section presents the latency and throughput information for the

IA-32 instruction set including the Streaming SIMD Extensions 2,

Streaming SIMD Extensions, MMX technology, and most of the

frequently used general-purpose integer and x87 floating-point

instructions.

Due to the complexity of dynamic execution and out-of-order nature of

the execution core, the instruction latency data may not be sufficient to

previous next