IBM 750GX Computer Accessories User Manual


 
User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_06.fm.(1.2)
March 27, 2006
Instruction Timing
Page 211 of 377
6.2 Instruction Timing Overview
The 750GX design minimizes average instruction execution latency, the number of clock cycles it takes to
fetch, decode, dispatch, and execute instructions and make the results available for a subsequent instruction.
Some instructions, such as loads and stores, access memory and require additional clock cycles between the
execute phase and the write-back phase. These latencies vary depending on whether the access is to cache-
able or noncacheable memory, whether it hits in the L1 or L2 cache, whether the cache access generates a
write-back to memory, whether the access causes a snoop hit from another device that generates additional
activity, and other conditions that affect memory accesses.
The 750GX implements many features to improve throughput, such as pipelining, issuing superscalar instruc-
tions, branch folding, 2-level speculative branch handling, two types of branch prediction, and multiple execu-
tion units that operate independently and in parallel.
As an instruction passes from stage to stage in a pipelined system, multiple instruction are in various stages
of execution at any given time. Also, with multiple execution units operating in parallel, more then one instruc-
tion can be completed in a single cycle.
The 750GX contains the following execution units that operate independently and in parallel:
Branch processing unit (BPU)
Integer unit 1 (IU1)—executes all integer instructions
Integer unit 2 (IU2)—executes all integer instructions except multiplies and divides
Stage The processing of instructions in the 750GX is done in stages. They are: fetch,
decode/dispatch, execute, complete, and retirement. The fetch unit brings instruc-
tions from the memory system into the instruction queue. Once in the instruction
queue, the dispatch unit must do a partial decode on the instruction to determine its
type. If the instruction is an integer, it is passed to the integer execution unit. If it is
a floating-point type, it is passed to the floating-point execution unit. If it is a branch,
it is processed immediately by branch folding and branch prediction functions.
Instructions spend one or more cycles in each stage as they are being processed
by the 750GX processor.
Stall An occurrence when an instruction cannot proceed to the next stage. An instruction
can spend multiple cycles in one stage. An integer multiply, for example, takes
multiple cycles in the execute stage. When this occurs, subsequent instructions
might stall.
Superscalar A superscalar processor is one that has multiple execution units. The 750GX
processor has one floating-point unit, two integer units, one load/store unit, and a
system unit for miscellaneous instructions. PowerPC instructions are processed in
parallel by these execution units.
Throughput A measure of the total number of instructions that are processed by all execution
units per unit of time.
Write-back Write-back, in the context of instruction handling, occurs when a result is written
into the architectural registers (typically the GPRs and FPRs). Results are written
back at retirement time from the Rename Registers for most instructions. The
instruction is also removed from the completion queue at this time.