IBM 750GL Manual

next previous

User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

gx_06.fm.(1.2)

March 27, 2006

Instruction Timing

Page 213 of 377

The instruction pipeline stages are described as follows:

• The instruction fetch stage includes the clock cycles necessary to request instructions from the memory

system and the time the memory system takes to respond to the request. Instruction fetch timing

depends on many variables, such as whether the instruction is in the branch target instruction cache, the

L1 instruction cache, or the L2 cache. If instructions must be fetched from system memory, other factors

affect instruction fetch timing including the processor-to-bus clock ratio, the amount of bus traffic, and

whether any cache-coherency operations are required.

Because there are so many variables, unless otherwise specified, the instruction timing examples below

assume optimal performance and assume instructions are available in the instruction queue in the same

clock cycle that they are requested. The fetch stage ends when instructions are loaded into the instruc-

tion queue.

• The decode/dispatch stage consists of the time it takes to decode the instruction and dispatch it from the

instruction queue to the appropriate execution unit. Instruction dispatch requires the following:

– Instructions can be dispatched only from the two lowest instruction queue entries, IQ0 and IQ1.

– A maximum of two instructions can be dispatched per clock cycle, and one additional branch instruc-

tion can be handled by the BPU.

– Only one instruction can be dispatched to each execution unit per clock cycle.

– There must be a vacancy in the specified execution-unit reservation station.

– A Rename Register must be available for each destination operand specified by the instruction.

– For an instruction to dispatch, the appropriate execution-unit reservation station must be available,

and there must be an open position in the completion queue. If no entry is available, the instruction

remains in the instruction queue (IQ).

• The execute stage consists of the time between dispatch to the execution unit (or reservation station) and

the point at which the instruction vacates the execution unit.

Most integer instructions have a 1-cycle latency; results of these instructions can be used in the clock

cycle after an instruction enters the execution unit. However, integer multiply and divide instructions take

multiple clock cycles to complete. IU1 can process all integer instructions; IU2 can process all integer

instructions except multiply and divide instructions.

The LSU and FPU are pipelined (as shown in Figure 6-2 on page 212).

• The complete (complete/write-back) pipeline stage maintains the correct architectural machine state and

commits the rename register values to the architectural registers at the proper time. If the completion

logic detects an instruction containing an exception status, all subsequent instructions are cancelled; their

execution results in the Rename Registers are discarded; and the correct instruction stream is fetched.

The complete stage ends when the instruction is retired. Two instructions can be retired per cycle.

Instructions are retired only from the two lowest completion queue entries, CQ0 and CQ1.