General Optimization Guidelines 2
2-35
A load that forwards from a store must wait for the store’s data to be
written to the store buffer before proceeding, but other, unrelated loads
need not wait.
Assembly/Compiler Coding Rule 20. (H impact, ML generality) If it is
necessary to extract a non-aligned portion of stored data, read out the smallest
aligned portion that completely contains the data and shift/mask the data as
necessary.
This is better than incurring the penalties of a failed store-forward.
Assembly/Compiler Coding Rule 21. (MH impact, ML generality) Avoid
several small loads after large stores to the same area of memory by using a
single large read and register copies as needed.
Example 2-12 contains several store-forwarding situations when small
loads follow large stores. The first three load operations illustrate the
situations described in Rule 22. However, the last load operation gets
data from store-forwarding without problem.
Example 2-13 illustrates a store-forwarding situation when a large load
follows after several small stores. The data needed by the load operation
cannot be forwarded because all of the data that needs to be forwarded is
not contained in the store buffer. Avoid large loads after small stores to
the same area of memory.
Example 2-12 Several Situations of Small Loads After Large Store
mov [EBP],‘abcd’
mov AL, [EBP] ; not blocked - same alignment
mov BL, [EBP + 1] ; blocked
mov CL, [EBP + 2] ; blocked
mov DL, [EBP + 3] ; blocked
mov AL, [EBP] ; not blocked - same alignment
; n.b. passes older blocked loads