AMD 250 Computer Hardware User Manual


 
Chapter 2 C and C++ Source-Level Optimizations 35
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
2.16 Explicit Parallelism in Code
Optimization
Where possible, break long dependency chains into several independent dependency chains that can
then be executed in parallel, exploiting the execution units in each pipeline.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale and Examples
This is especially important to break long dependency chains into smaller executing units in floating-
point code, whether it is mapped to x87, SSE, or SSE2 instructions, because of the longer latency of
floating-point operations. Because most languages (including ANSI C) guarantee that floating-point
expressions are not reordered, compilers cannot usually perform such optimizations unless they offer
a switch to allow noncompliant reordering of floating-point expressions according to algebraic rules.
Reordered code that is algebraically identical to the original code does not necessarily produce
identical computational results due to the lack of associativity of floating-point operations. There are
well-known numerical considerations in applying these optimizations (consult a book on numerical
analysis). In some cases, these optimizations may lead to unexpected results. In the vast majority of
cases, the final result differs only in the least-significant bits.
Listing 10. Avoid
double a[100], sum;
int i;
sum = 0.0f;
for (i = 0; i < 100; i++) {
sum += a[i];
}