20 C and C++ Source-Level Optimizations Chapter 2
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
2.8 Unnecessary Store-to-Load Dependencies
A store-to-load dependency exists when data is stored to memory, only to be read back shortly
thereafter. For details, see “Store-to-Load Forwarding Restrictions” on page 100. The
AMD Athlon™ 64 and AMD Opteron™ processors contain hardware to accelerate such store-to-load
dependencies, allowing the load to obtain the store data before it has been written to memory.
However, it is still faster to avoid such dependencies altogether and keep the data in an internal
register.
Avoiding store-to-load dependencies is especially important if they are part of a long dependency
chain, as may occur in a recurrence computation. If the dependency occurs while operating on arrays,
many compilers are unable to optimize the code in a way that avoids the store-to-load dependency. In
some instances the language definition may prohibit the compiler from using code transformations
that would remove the store-to-load dependency. Therefore, it is recommended that the programmer
remove the dependency manually, for example, by introducing a temporary variable that can be kept
in a register, as in the following example. This can result in a significant performance increase.
Listing 3. Avoid
double x[VECLEN], y[VECLEN], z[VECLEN];
unsigned int k;
for (k = 1; k < VECLEN; k++) {
x[k] = x[k-1] + y[k];
}
for (k = 1; k < VECLEN; k++) {
x[k] = z[k] * (y[k] - x[k-1]);
}
Listing 4. Preferred
double x[VECLEN], y[VECLEN], z[VECLEN];
unsigned int k;
double t;
t = x[0];
for (k = 1; k < VECLEN; k++) {
t = t + y[k];
x[k] = t;
}
t = x[0];
for (k = 1; k < VECLEN; k++) {
t = z[k] * (y[k] - t);
x[k] = t;
}