AMD 250 Computer Hardware User Manual


 
22 C and C++ Source-Level Optimizations Chapter 2
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
2.9 Matching Store and Load Size
Optimization
Align memory accesses and match addresses and sizes of stores and dependent loads.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
The AMD Athlon 64 and AMD Opteron processors contain a load-store buffer to speed up the
forwarding of store data to dependent loads. However, this store-to-load forwarding (STLF) inside the
load-store buffer occurs, in general, only when the addresses and sizes of the store and the dependent
load match, and when both memory accesses are aligned. For details, see “Store-to-Load Forwarding
Restrictions” on page 100.
It is impossible to control load and store activity at the source level so as to avoid all cases that violate
restrictions placed on store-to-load-forwarding. In some instances it is possible to spot such cases in
the source code. Size mismatches can easily occur when different-size data items are joined in a
union. Address mismatches could be the result of pointer manipulation.
The following examples show a situation involving a union of different-size data items. The examples
show a user-defined unsigned 16.16 fixed-point type and two operations defined on this type.
Function
fixed_add adds two fixed-point numbers, and function fixed_int extracts the integer
portion of a fixed-point number. Listing 5 shows an inappropriate implementation of
fixed_int,
which, when used on the result of
fixed_add, causes misalignment, address mismatch, or size
mismatch between memory operands, such that no store-to-load forwarding in the load-store buffer
takes place. Listing 6 shows how to properly implement fixed_int in order to allow store-to-load
forwarding in the load-store buffer.
Examples
Listing 5. Avoid
typedef union {
unsigned int whole;
struct {
unsigned short frac; /* Lower 16 bits are fraction. */
unsigned short intg; /* Upper 16 bits are integer. */
} parts;
} FIXED_U_16_16;