Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

IA-32 Instruction Latency and Throughput C

C-21

For the sake of simplicity, all data being requested is assumed to reside

in the first level data cache (cache hit). In general, IA-32 instructions

with load operations that execute in the integer ALU units require two

more clock cycles than the corresponding register-to-register flavor of

the same instruction. Throughput of these instructions with load

operation remains the same with the register-to-register flavor of the

instructions.

Floating-point, MMX technology, Streaming SIMD Extensions and

Streaming SIMD Extension 2 instructions with load operations require 6

more clocks in latency than the register-only version of the instructions,

but throughput remains the same.

When store operations are on the critical path, their results can generally

be forwarded to a dependent load in as few as zero cycles. Thus, the

latency to complete and store isn’t relevant here.

previous next