Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

20 C and C++ Source-Level Optimizations Chapter 2

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

2.8 Unnecessary Store-to-Load Dependencies

A store-to-load dependency exists when data is stored to memory, only to be read back shortly

thereafter. For details, see “Store-to-Load Forwarding Restrictions” on page 100. The

AMD Athlon™ 64 and AMD Opteron™ processors contain hardware to accelerate such store-to-load

dependencies, allowing the load to obtain the store data before it has been written to memory.

However, it is still faster to avoid such dependencies altogether and keep the data in an internal

register.

Avoiding store-to-load dependencies is especially important if they are part of a long dependency

chain, as may occur in a recurrence computation. If the dependency occurs while operating on arrays,

many compilers are unable to optimize the code in a way that avoids the store-to-load dependency. In

some instances the language definition may prohibit the compiler from using code transformations

that would remove the store-to-load dependency. Therefore, it is recommended that the programmer

remove the dependency manually, for example, by introducing a temporary variable that can be kept

in a register, as in the following example. This can result in a significant performance increase.

Listing 3. Avoid

double x[VECLEN], y[VECLEN], z[VECLEN];

unsigned int k;

for (k = 1; k < VECLEN; k++) {

x[k] = x[k-1] + y[k];

}

for (k = 1; k < VECLEN; k++) {

x[k] = z[k] * (y[k] - x[k-1]);

}

Listing 4. Preferred

double x[VECLEN], y[VECLEN], z[VECLEN];

unsigned int k;

double t;

t = x[0];

for (k = 1; k < VECLEN; k++) {

t = t + y[k];

x[k] = t;

}

t = x[0];

for (k = 1; k < VECLEN; k++) {

t = z[k] * (y[k] - t);

x[k] = t;

}

previous next