Intel IA-32 Computer Accessories User Manual


 
General Optimization Guidelines 2
2-15
Branch Prediction
Branch optimizations have a significant impact on performance. By
understanding the flow of branches and improving the predictability of
branches, you can increase the speed of code significantly.
Optimizations that help branch prediction are:
Keep code and data on separate pages (a very important item, see
more details in the “Memory Accesses” section).
Whenever possible, eliminate branches.
Arrange code to be consistent with the static branch prediction
algorithm.
Use the pause instruction in spin-wait loops.
Inline functions and pair up calls and returns.
Unroll as necessary so that repeatedly-executed loops have sixteen
or fewer iterations, unless this causes an excessive code size
increase.
Separate branches so that they occur no more frequently than every
three
μops where possible.
Eliminating Branches
Eliminating branches improves performance because it:
reduces the possibility of mispredictions
reduces the number of required branch target buffer (BTB) entries;
conditional branches, which are never taken, do not consume BTB
resources
There are four principal ways of eliminating branches:
arrange code to make basic blocks contiguous
unroll loops, as discussed in the “Loop Unrolling” section
use the cmov instruction
use the setcc instruction