IBM 750GL Computer Accessories User Manual


 
User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_06.fm.(1.2)
March 27, 2006
Instruction Timing
Page 221 of 377
The instruction timing for this example is described cycle-by-cycle as follows:
1. In cycle 0, instructions 0–3 are fetched from the instruction cache. Instructions 0 and 1 are placed in the
two entries in the instruction queue from which they can be dispatched on the next clock cycle.
2. In cycle 1, instructions 0 and 1 are dispatched to the IU2 and FPU, respectively. Notice that, for instruc-
tions to be dispatched, they must be assigned positions in the completion queue. In this case, since the
completion queue was empty, instructions 0 and 1 take the two lowest entries in the completion queue.
Instructions 2 and 3 drop into the two dispatch positions in the instruction queue. Because there were two
positions available in the instruction queue in clock cycle 0, two instructions (4 and 5) are fetched into the
instruction queue. Instruction 4 is a branch unconditional instruction, which resolves immediately as
taken. Because the branch is taken, it can therefore be folded from the instruction queue.
3. In cycle 2, assume a BTIC hit occurs and target instructions 6 and 7 are fetched into the instruction
queue, replacing the folded b instruction (4) and instruction 5. Instruction 0 completes, writes back its
results, and vacates the completion queue by the end of the clock cycle. Instruction 1 enters the second
FPU execute stage; instruction 2 is dispatched to the IU2; and instruction 3 is dispatched into the first
FPU execute stage. Because the taken branch instruction (4) does not update either CTR or LR, it does
not require a position in the completion queue and can be folded.
4. In cycle 3, target instructions (6 and 7) are fetched, replacing instructions 4 and 5 in IQ0 and IQ1. This
replacement on taken branches is called branch folding. Instruction 1 proceeds through the last of the
three FPU execute stages. Instruction 2 has executed, but must remain in the completion queue until
instruction 1 completes. Instruction 3 replaces instruction 1 in the second stage of the FPU, and instruc-
tion 6 replaces instruction 3 in the first stage.
Because there were four vacancies in the instruction queue in the previous clock cycle, instructions 8–11
are fetched in this clock cycle.
5. Instruction 1 completes in cycle 4, allowing instruction 2 to complete. Instructions 3 and 6 continue
through the FPU pipeline. Because there were two openings in the completion queue in the previous
cycle, instructions 7 and 8 are dispatched to the FPU and IU2, respectively, filling the completion queue.
Similarly, because there was one opening in the instruction queue in clock cycle 3, one instruction is
fetched.
6. In cycle 5, instruction 3 completes, and instructions 13 and 14 are fetched. Instructions 6 and 7 continue
through the FPU pipeline. No instructions are dispatched in this clock cycle because there were no
vacant CQ entries in cycle 4.
7. In cycle 6, instruction 6 completes, instruction 7 is in stage 3 of the FPU execute stage, and although
instruction 8 has executed, it must wait for instruction 7 to complete. The two integer instructions, 9 and
10, are dispatched to the IU2 and IU1, respectively. No instructions are fetched because the instruction
queue was full on the previous cycle.
8. In cycle 7, instruction 7 completes, allowing instruction 8 to complete as well. Instructions 9 and 10
remain in the completion stage, since at most two instructions can complete in a cycle. Because there
was one opening in the completion queue in cycle 6, instruction 11 is dispatched to the IU2. Two more
instructions, 15 and 16 (which are shown only in the instruction queue), are fetched.
9. In cycle 8, instructions 9–11 are through executing. Instructions 9 and 10 complete, write back, and
vacate the completion queue. Instruction 11 must wait to complete in the following cycle. Because the
completion queue had one opening in the previous cycle, instruction 12 can be dispatched to the FPU.
Similarly, the instruction queue had one opening in the previous cycle, so one additional instruction, 17,
can be fetched.