IBM SA14-2339-04 Personal Computer User Manual


 
C-4 PPC405 Core User’s Manual
Instruction timings for branch instructions follow:
A branch known not taken (BKNT) executes in one clock cycle. By definition a BKNT does not have
address or condition dependencies.
A branch known taken (BKT) by definition has no condition dependencies, but can have address
dependencies.A BKT without address dependencies can execute in one clock cycle if it is first
decoded from the PFB0 stage, or in two clock cycles if it is first decoded in the DCD stage. A BKT
having address dependencies can execute in two clock cycles if there is one instruction between
the branch and the address dependency, or in three clock cycles if there are no instructions
between the branch and address dependency.
A branch predicted not taken (BPNT), which must have condition dependencies, executes in one
clock cycle if the prediction is correct. If the prediction is incorrect, the branch can take two or three
cycles. If there was one instruction between the branch and the instruction causing the condition
dependency, the branch executes in two cycles. If there were no instructions between the branch
and the instruction causing the condition dependency, the branch executes in three clock cycles.
A branch that is correctly predicted taken (BPT), which must have condition dependencies,
executes in one clock cycle, if it is first decoded from the PFB0 stage, or two clock cycles if it is first
decoded in the DCD stage. If the prediction is incorrect, the branch can take two or three cycles. If
there is one instruction between the branch and the instruction causing the condition dependency,
the branch executes in two cycles. If there are no instructions between the branch and the
instruction causing the condition dependency, the branch executes in three clock cycles.
C.2.3 Multiplies
For multiply instructions having two word operands, hardware internal to the core automatically
detects smaller operand sizes (by examining sign bit extension) to reduce the number of cycles
necessary to complete the multiplication.
The PPC405 also supports multiply accumulate (MAC) instructions and multiply instructions having
halfword operands.
Word and halfword multiply instructions are pipelined in the execution unit and use the same
multiplication hardware. Because these instructions are pipelined in the execution stage they have
latency and reissue rate cycle numbers. Under conditions to be described, a second multiply or MAC
instruction can begin execution before the first multiply or MAC instruction completes. When these
conditions are met, the reissue rate cycle numbers should be used; otherwise, the latency cycle
numbers should be used. (A MAC or multiply instruction can follow another MAC or a multiply and still
meet the conditions that support the use of the reissue rate cycle numbers.
Use reissue rate cycle numbers
for multiply or MAC instructions that are followed by another multiply
or MAC instruction, and do not have an operand dependency from a previous multiply or MAC
instruction. However, one operand dependency is allowed for reissue rate cycle numbers. Internal
forwarding logic allows the accumulate value of a first MAC instruction to be used as the accumulate
value of a second MAC instruction without affecting the reissue rate.
Use latency cycle numbers
for multiply or MAC instructions that are not followed by another multiply
or MAC, or that have an operand dependency from a previous multiply or MAC instruction. However,
accumulate-only dependencies between adjacent MAC instructions use reissue rate cyle numbers.
An operand dependency exists when a second multiply or MAC instruction depends on the result of a
first multiply or MAC instruction.