IBM 750GL Computer Accessories User Manual


 
User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
PowerPC 750GX Overview
Page 30 of 377
gx_01.fm.(1.2)
March 27,2006
are flushed from the processor, and instruction fetching resumes along the correct path. The 750GX allows a
second branch instruction to be predicted; instructions from the second predicted branch instruction stream
can be fetched but cannot be dispatched. These instructions are held in the instruction queue.
Dynamic prediction is implemented using a 512-entry BHT. The BHT is a cache that provides two bits per
entry that together indicate four levels of prediction for a branch instruction—not-taken, strongly not-taken,
taken, strongly taken. When dynamic branch prediction is disabled, the BPU uses a bit in the instruction
encoding to predict the direction of the conditional branch. Therefore, when an unresolved conditional branch
instruction is encountered, the 750GX executes instructions from the predicted path although the results are
not committed to architected registers until the conditional branch is resolved. This execution can continue
until a second unresolved branch instruction is encountered.
When a branch is taken (or predicted as taken), the instructions from the untaken path must be flushed, and
the target instruction stream must be fetched into the IQ. The BTIC is a 64-entry cache that contains the most
recently used branch target instructions, typically in pairs. When an instruction fetch hits in the BTIC, the
instructions arrive in the instruction queue in the next clock cycle, a clock cycle sooner than they would arrive
from the instruction cache. Additional instructions arrive from the instruction cache in the next clock cycle.
The BTIC reduces the number of missed opportunities to dispatch instructions and gives the processor a
1-cycle head start on processing the target stream. With the use of the BTIC, the 750GX achieves a zero-
cycle delay for branches taken. Coherency of the BTIC table is maintained by table reset on an instruction-
cache flash invalidate, Instruction Cache Block Invalidate (icbi) or Return from Interrupt (rfi) instruction
execution, or when an exception is taken.
The BPU contains an adder to compute branch target addresses and three user-control registers—the Link
Register (LR), the Count Register (CTR), and the CR. The BPU calculates the return pointer for subroutine
calls and saves it into the LR for certain types of branch instructions. The LR also contains the branch target
address for the Branch Conditional to Link Register (bclrx) instruction. The CTR contains the branch target
address for the Branch Conditional to Count Register (bcctrx) instruction. Because the LR and CTR are
special purpose registers (SPRs), their contents can be copied to or from any GPR. Since the BPU uses dedi-
cated registers rather than GPRs or FPRs, execution of branch instructions is largely independent from
execution of fixed-point and floating-point instructions.
1.2.1.3 Completion Unit
The completion unit operates closely with the dispatch unit. Instructions are fetched and dispatched in
program order. At the point of dispatch, the program order is maintained by assigning each dispatched
instruction a successive entry in the 6-entry completion queue. The completion unit tracks instructions from
dispatch through execution and retires them in program order from the two bottom entries in the completion
queue (CQ0 and CQ1).
Instructions cannot be dispatched to an execution unit unless there is a vacancy in the completion queue and
rename buffers are available. Branch instructions that do not update the CTR or LR are removed from the
instruction stream and do not occupy a space in the completion queue. Instructions that update the CTR and
LR follow the same dispatch and completion procedures as nonbranch instructions, except that they are not
issued to an execution unit.
An instruction is retired when it is removed from the completion queue and its results are written to archi-
tected registers (GPRs, FPRs, LR, and CTR) from the rename buffers. In-order completion ensures program
integrity and the correct architectural state when the 750GX must recover from a mispredicted branch or any
exception. Also, the rename buffers assigned to it by the dispatch unit are returned to the available rename
buffer pool. These rename buffers are reused by the dispatch unit as subsequent instructions are dispatched.