General Optimization Guidelines 2
2-15
Branch Prediction
Branch optimizations have a significant impact on performance. By
understanding the flow of branches and improving the predictability of
branches, you can increase the speed of code significantly.
Optimizations that help branch prediction are:
• Keep code and data on separate pages (a very important item, see
more details in the “Memory Accesses” section).
• Whenever possible, eliminate branches.
• Arrange code to be consistent with the static branch prediction
algorithm.
• Use the pause instruction in spin-wait loops.
• Inline functions and pair up calls and returns.
• Unroll as necessary so that repeatedly-executed loops have sixteen
or fewer iterations, unless this causes an excessive code size
increase.
• Separate branches so that they occur no more frequently than every
three
μops where possible.
Eliminating Branches
Eliminating branches improves performance because it:
• reduces the possibility of mispredictions
• reduces the number of required branch target buffer (BTB) entries;
conditional branches, which are never taken, do not consume BTB
resources
There are four principal ways of eliminating branches:
• arrange code to make basic blocks contiguous
• unroll loops, as discussed in the “Loop Unrolling” section
• use the cmov instruction
• use the setcc instruction