Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

C-1

C

IA-32 Instruction Latency and

Throughput

This appendix contains tables of the latency, throughput and execution

units that are associated with more-commonly-used IA-32 instructions

1

.

The instruction timing data varies within the IA-32 family of

processors. Only data specific to the Intel Pentium 4, Intel Xeon

processors and Intel Pentium M processor are provided. The relevance

of instruction throughput and latency information for code tuning is

discussed in Chapter 1 and Chapter 2, see “Execution Core Detail” in

Chapter 1 and “Floating Point/SIMD Operands” in Chapter 2.

This appendix contains the following sections:

• “Overview”– an overview of issues related to instruction selection

and scheduling.

• “Definitions” – the definitions for the primary information

presented in the tables in section “Latency and Throughput.”

• “Latency and Throughput of Pentium 4 and Intel Xeon processors”

– the listings of IA-32 instruction throughput, latency and execution

units associated with commonly-used instruction.

1. Although instruction latency may be useful in some limited situations (e.g., a tight loop

with a dependency chain that exposes instruction latency), software optimization on

super-scalar, out-of-order microarchitecture, in general, will benefit much more on

increasing the effective throughput of the larger-scale code path. Coding techniques that

rely on instruction latency alone to influence the scheduling of instruction is likely to be

sub-optimal as such coding technique is likely to interfere with the out-of-order machine or

restrict the amount of instruction-level parallelism.

previous next