IBM 750GL Computer Accessories User Manual


 
User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_06.fm.(1.2)
March 27, 2006
Instruction Timing
Page 213 of 377
The instruction pipeline stages are described as follows:
The instruction fetch stage includes the clock cycles necessary to request instructions from the memory
system and the time the memory system takes to respond to the request. Instruction fetch timing
depends on many variables, such as whether the instruction is in the branch target instruction cache, the
L1 instruction cache, or the L2 cache. If instructions must be fetched from system memory, other factors
affect instruction fetch timing including the processor-to-bus clock ratio, the amount of bus traffic, and
whether any cache-coherency operations are required.
Because there are so many variables, unless otherwise specified, the instruction timing examples below
assume optimal performance and assume instructions are available in the instruction queue in the same
clock cycle that they are requested. The fetch stage ends when instructions are loaded into the instruc-
tion queue.
The decode/dispatch stage consists of the time it takes to decode the instruction and dispatch it from the
instruction queue to the appropriate execution unit. Instruction dispatch requires the following:
Instructions can be dispatched only from the two lowest instruction queue entries, IQ0 and IQ1.
A maximum of two instructions can be dispatched per clock cycle, and one additional branch instruc-
tion can be handled by the BPU.
Only one instruction can be dispatched to each execution unit per clock cycle.
There must be a vacancy in the specified execution-unit reservation station.
A Rename Register must be available for each destination operand specified by the instruction.
For an instruction to dispatch, the appropriate execution-unit reservation station must be available,
and there must be an open position in the completion queue. If no entry is available, the instruction
remains in the instruction queue (IQ).
The execute stage consists of the time between dispatch to the execution unit (or reservation station) and
the point at which the instruction vacates the execution unit.
Most integer instructions have a 1-cycle latency; results of these instructions can be used in the clock
cycle after an instruction enters the execution unit. However, integer multiply and divide instructions take
multiple clock cycles to complete. IU1 can process all integer instructions; IU2 can process all integer
instructions except multiply and divide instructions.
The LSU and FPU are pipelined (as shown in Figure 6-2 on page 212).
The complete (complete/write-back) pipeline stage maintains the correct architectural machine state and
commits the rename register values to the architectural registers at the proper time. If the completion
logic detects an instruction containing an exception status, all subsequent instructions are cancelled; their
execution results in the Rename Registers are discarded; and the correct instruction stream is fetched.
The complete stage ends when the instruction is retired. Two instructions can be retired per cycle.
Instructions are retired only from the two lowest completion queue entries, CQ0 and CQ1.