B-40 March, 2003 Developer’s Manual
Intel
®
80200 Processor based on Intel
®
XScale
™
Microarchitecture
Optimization Guide
B.5.3 Scheduling Multiply Instructions
Multiply instructions can cause pipeline stalls due to either resource conflicts or result latencies.
The following code segment would incur a stall of 0-3 cycles depending on the values in registers
r1, r2, r4 and r5 due to resource conflicts.
mul r0, r1, r2
mul r3, r4, r5
The following code segment would incur a stall of 1-3 cycles depending on the values in registers
r1 and r2 due to result latency.
mul r0, r1, r2
mov r4, r0
Note that a multiply instruction that sets the condition codes blocks the whole pipeline. A 4 cycle
multiply operation that sets the condition codes behaves the same as a 4 cycle issue operation.
Consider the following code segment:
muls r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
The add operation above would stall for 3 cycles if the multiply takes 4 cycles to complete. It is
better to replace the code segment above with the following sequence:
mul r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
cmp r0, #0
Please refer to Section 14.4, “Instruction Latencies” to get the instruction latencies for various
multiply instructions. The multiply instructions should be scheduled taking into consideration these
instruction latencies.