Chapter 2 C and C++ Source-Level Optimizations 33
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Transform the loop in Listing 9 (by using the switch statement) into:
#define combine(c1, c2) (((c1) << 1) + (c2))
switch (combine(CONSTANT0 != 0, CONSTANT1 != 0)) {
case combine(0, 0):
for(i...) {
DoWork0(i);
DoWork2(i);
}
break;
case combine(1, 0):
for(i...) {
DoWork1(i);
DoWork2(i);
}
break;
case combine(0, 1):
for(i...) {
DoWork0(i);
DoWork3(i);
}
break;
case combine( 1, 1 ):
for(i...) {
DoWork1(i);
DoWork3(i);
}
break;
default:
break;
}
Some introductory code is necessary to generate all the combinations for the switch constant and the
total amount of code has doubled. However, the inner loops are now free of
if statements. In ideal
cases where the
DoWorkn functions are inlined, the successive functions have greater overlap, leading
to greater parallelism than possible in the presence of intervening
if statements.
The same idea can be applied to constant switch statements or to combinations of switch statements
and
if statements inside of for loops. The method used to combine the input constants becomes
more complicated but benefits performance.
However, the number of inner loops can also substantially increase. If the number of inner loops is
prohibitively high, then only the most common cases must be dealt with directly, and the remaining
cases can fall back to the old code in the default clause of the
switch statement. This situation is
typical of run-time generated code. While the performance of run-time generated code can be
improved by means similar to those presented here, it is much harder to maintain and developers must
do their own code-generation optimizations without the help of an available compiler.