B-10 March, 2003 Developer’s Manual
Intel
®
80200 Processor based on Intel
®
XScale
™
Microarchitecture
Optimization Guide
B.3.1.2. Optimizing Branches
Branches decrease application performance by indirectly causing pipeline stalls. Branch prediction
improves the performance by lessening the delay inherent in fetching a new instruction stream. The
number of branches that can accurately be predicted is limited by the size of the branch target
buffer. Since the total number of branches executed in a program is relatively large compared to the
size of the branch target buffer; it is often beneficial to minimize the number of branches in a
program. Consider the following C code segment.
int foo(int a)
{
if (a > 10)
return 0;
else
return 1;
}
The code generated for the if-else portion of this code segment using branches is:
cmp r0, #10
ble L1
mov r0, #0
b L2
L1:
mov r0, #1
L2:
The code generated above takes three cycles to execute the else part and four cycles for the if-part
assuming best case conditions and no branch misprediction penalties. In the case of Intel
®
80200
processors, a branch misprediction incurs a penalty of four cycles. If the branch is mispredicted
50% of the time, and if we assume that both the if-part and the else-part are equally likely to be
taken, on an average the code above takes 5.5 cycles to execute.
.
If we were to use Intel
®
80200 processors to execute instructions conditionally, the code generated
for the above if-else statement is:
cmp r0, #10
movgt r0, #0
movle r0, #1
The above code segment would not incur any branch misprediction penalties and would take three
cycles to execute assuming best case conditions. As can be seen, using conditional instructions
speeds up execution significantly. However, the use of conditional instructions should be carefully
considered to ensure that it does improve performance. To decide when to use conditional
instructions over branches consider the following hypothetical code segment:
if (cond)
if_stmt
else
else_stmt
Assume that we have the following data:
N1
B
Number of cycles to execute the if_stmt assuming the use of branch instructions
N2
B
Number of cycles to execute the else_stmt assuming the use of branch instructions
P1 Percentage of times the if_stmt is likely to be executed
50
100
---------
4
34+
2
------------+×
5.5= cycles