102 Cache and Memory Optimizations Chapter 5
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
A common case of misaligned store-data forwarding involves the passing of misaligned quadword
floating-point data on the doubleword-aligned integer stack. Avoid the type of code shown in the
following example:
mov esp, 24h
fstp QWORD PTR [esp] ; ESP = 24
... ; Store occurs to quadword misaligned address.
fld QWORD PTR [esp] ; Quadword load cannot forward from quadword
; misaligned ‘FSTP[ESP]’ store operation.
High-Byte Store-Buffer Data-Forwarding Restriction
If the following condition is present, there is a high-byte store-data buffer-forwarding restriction—the
store data is from a high-byte register (AH, BH, CH, DH).
Avoid the type of code shown in the following example:
mov eax, 10h
mov [eax], bh ; High-byte store
...
mov dl, [eax] ; Load cannot forward from high-byte store.
One Supported Store-to-Load Forwarding Case
There is one case of a mismatched store-to-load forwarding that is supported by AMD Athlon 64 and
AMD Opteron processors. The lower 32 bits from an aligned quadword write feeding into a
doubleword read is allowed, as illustrated in the following example:
movq [alignedQword], mm0
...
mov eax, [alignedQword]
Store-to-Load Forwarding—False Dependencies
A load may detect a false dependency on a store-buffer entry if the load does not have a true
dependency on the most recent store that matches address bits 11–2 of the load. A false match could
occur on the most recent store that writes somewhere within the same doubleword of memory as the
load. In addition, a false match could occur if a store address is located at an exact multiple of