IBM SA14-2339-04 Personal Computer User Manual


 
Code Optimization and Instruction Timings C-5
Table C-2 summarizes the multiply and MAC instruction timings. In the table, the syntax “[o]” indicates
that the instruction has an “o” form that updates XER[SO,OV], and a “non-o” form. The syntax “[.]”
indicates that the instruction has a “record” form that updates CR[CR0], and a “non-record” form.
C.2.4 Scalar Load Instructions
Generally, the PPC405 executes cachable load instructions that hit in the data cache array or line fill
buffer, or noncachable load instructions that hit in the line fill buffer (when enabled), in one cycle.
However, the pipelined nature of load instructions can even cause loads that hit in the cache or line fill
buffer to appear to take extra cycles under some conditions.
If a load is followed by an instruction that uses the load target as an operand, a load-use dependency
exists. When the load target is returned, it is forwarded to the operand register of the “using”
instruction. This forwarding results in an additional cycle of latency to a load immediately followed by
a “using” instruction, causing the load to appear to execute in two cycles.
To improve cache-to-core timing or data-side on-chip memory (OCM)- to-core timing, the system
designer can disable operand forwarding from the data cache unit (DCU) or OCM to the core. When
operand forwarding is disabled, the load data needed by the “using” instruction is placed in an
intermediate latch before the load data is forwarded to the operand register of the “using” instruction.
When the load target is returned, it is forwarded to the operand register of the “using” instruction. This
introduces two additional cycles of latency to a load immediately followed by a “using” instruction,
causing the load instruction to appear to execute in three cycles.
Because the PPC405 can execute instructions that follow load misses if no load-use dependency
exists, the load and the “using” instruction should be separated by two “non-using” instructions when
possible. If only one instruction can be placed between the load and the “using” instruction, the load
appears to execute in two cycles.
Table C-2. Multiply and MAC Instruction Timing
Operation
Reissue Rate
Cycles
Latency
Cycles
MAC
MAC and negative MAC instructions
12
Halfword
× Halfword
mullhw[.], mullhwu[.], mulhhw[.],
mulhhwu[.],
mulchw[.], mulchwu[.]
12
mulli[.], mullw[o][.],
mulhw[.], mulhwu[.]
23
Halfword
× Word
mulli[.], mullw[o][.],
mulhw[.], mulhwu[.]
23
Word
× Word
mullw[o][.], mulhw[.], mulhwu[.]
45