IA-32 Intel® Architecture Optimization
1-18
execution units are not pipelined (meaning that µops cannot be
dispatched in consecutive cycles and the throughput is less than one per
cycle). The number of µops associated with each instruction provides a
basis for selecting instructions to generate. All µops executed out of the
microcode ROM involve extra overhead.
Execution Units and Issue Ports
At each cycle, the core may dispatch µops to one or more of four issue
ports. At the microarchitecture level, store operations are further divided
into two parts: store data and store address operations. The four ports
through which μops are dispatched to execution units and to load and
store operations are shown in Figure 1-4. Some ports can dispatch two
µops per clock. Those execution units are marked Double Speed.
Port 0. In the first half of the cycle, port 0 can dispatch either one
floating-point move µop (a floating-point stack move, floating-point
exchange or floating-point store data), or one arithmetic logical unit
(ALU) µop (arithmetic, logic, branch or store data). In the second half
of the cycle, it can dispatch one similar ALU µop.
Port 1. In the first half of the cycle, port 1 can dispatch either one
floating-point execution (all floating-point operations except moves, all
SIMD operations) µop or one normal-speed integer (multiply, shift and
rotate) µop or one ALU (arithmetic) µop. In the second half of the cycle,
it can dispatch one similar ALU µop.
Port 2. This port supports the dispatch of one load operation per cycle.
Port 3. This port supports the dispatch of one store address operation
per cycle.
The total issue bandwidth can range from zero to six µops per cycle.
Each pipeline contains several execution units. The µops are dispatched
to the pipeline that corresponds to the correct type of operation. For
example, an integer arithmetic logic unit and the floating-point
execution units (adder, multiplier, and divider) can share a pipeline.