Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Processor Family Overview
1-17
Some parts of the core may speculate that a common condition holds to
allow faster execution. If it does not, the machine may stall. An example
of this pertains to store-to-load forwarding (see “Store Forwarding” in
this chapter). If a load is predicted to be dependent on a store, it gets its
data from that store and tentatively proceeds. If the load turned out not
to depend on the store, the load is delayed until the real data has been
loaded from memory, then it proceeds.
Instruction Latency and Throughput
The superscalar out-of-order core contains hardware resources that can
execute multiple μops in parallel. The core’s ability to make use of
available parallelism of execution units can enhanced by software’s
ability to:
select IA-32 instructions that can be decoded in less than 4 μops
and/or have short latencies
order IA-32 instructions to preserve available parallelism by
minimizing long dependence chains and covering long instruction
latencies
order instructions so that their operands are ready and their
corresponding issue ports and execution units are free when they
reach the scheduler
This subsection describes port restrictions, result latencies, and issue
latencies (also referred to as throughput). These concepts form the basis
to assist software for ordering instructions to increase parallelism. The
order that μops are presented to the core of the processor is further
affected by the machine’s scheduling resources.
It is the execution core that reacts to an ever-changing machine state,
reordering μops for faster execution or delaying them because of
dependence and resource constraints. The ordering of instructions in
software is more of a suggestion to the hardware.
Appendix C, “IA-32 Instruction Latency and Throughput,” lists some of
the more-commonly-used IA-32 instructions with their latency, their
issue throughput, and associated execution units (where relevant). Some