User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_06.fm.(1.2)
March 27, 2006
Instruction Timing
Page 215 of 377
6.3 Timing Considerations
The 750GX is a superscalar processor; as many as three instructions can be issued to the execution units
(one branch instruction to the branch processing unit, and two instructions issued from the dispatch queue to
the other execution units) during each clock cycle. Only one instruction can be dispatched to each execution
unit.
Although instructions appear to the programmer to execute in program order, the 750GX improves perfor-
mance by executing multiple instructions at a time, using hardware to manage dependencies. When an
instruction is dispatched, the register file or a Rename Register from a previous instruction provides the
source data to the execution unit. The register files and Rename Register have sufficient bandwidth to allow
dispatch of two instructions per clock under most conditions.
The 750GX’s BPU decodes and executes branches immediately after they are fetched. When a conditional
branch cannot be resolved due to a CR data (or any) dependency, the branch direction is predicted and
execution continues on the predicted path. If the prediction is incorrect, the following steps are taken:
1. The instruction queue is purged and fetching continues from the correct path.
2. Any instructions behind (in program order) the predicted branch in the completion queue are allowed to
complete.
3. Instructions fetched on the mispredicted path of the branch are purged.
4. Fetching resumes along the correct (other) path.
After an execution unit finishes executing an instruction, it places resulting data into the appropriate GPR or
FPR Rename Register. The results are then stored into the correct GPR or FPR during the write-back stage
(retirement). If a subsequent instruction needs the result as a source operand, it is made available simulta-
neously to the appropriate execution unit, which allows a data-dependent instruction to be decoded and
dispatched without waiting to read the data from the register file. Branch instructions that update either the LR
or CTR write back their results in a similar fashion.
Section 6.3.1 describes this process in greater detail.
6.3.1 General Instruction Flow
As many as four instructions can be fetched into the instruction queue (IQ) in a single clock cycle. Instructions
enter the IQ and are issued to the various execution units from the dispatch queue. The 750GX tries to keep
the IQ full at all times, unless instruction-cache throttling is operating.
The number of instructions requested in a clock cycle is determined by the number of vacant spaces in the IQ
during the previous clock cycle. This is shown in the examples in this section. Although the instruction queue
can accept as many as four new instructions in a single clock cycle, if only one IQ entry is vacant, only one
instruction is fetched. Typically, instructions are fetched from the L1 instruction cache, but they might also be
fetched from the branch target instruction cache (BTIC) if a branch is taken. If the branch taken instruction
request hits in the BTIC, it can usually present the first two instructions of the new instruction stream in the
next clock cycle, giving enough time for the next pair of instructions to be fetched from the instruction L1
cache. This results in no idle cycles in the instruction stream (also known as a zero-cycle branch). If instruc-
tions are not in the BTIC or the L1 instruction cache, they are fetched from the L2 cache or from system
memory.