AMD 250 Computer Hardware User Manual


 
92 Cache and Memory Optimizations Chapter 5
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
5.1 Memory-Size Mismatches
Optimization
Avoid memory-size mismatches when different instructions operate on the same data. When one
instruction stores and another instruction subsequently loads the same data, keep their operands
aligned and keep the loads/stores of each operand the same size.
Application
This optimization applies to:
32-bit software
64-bit software
Examples—Store-to-Load-Forwarding Stalls
The following code examples result in a store-to-load-forwarding stall:
64-bit (Avoid)
foo DQ ? ; Assume foo is 8-byte aligned.
...
mov DWORD PTR foo, eax ; Store a DWORD to foo.
mov DWORD PTR foo+4, ebx ; Now store to foo+4.
mov rcx, QWORD PTR foo ; Load a QWORD from foo.
32-bit (Avoid)
foo DQ ? ; Assume foo is 4-byte aligned.
...
mov DWORD PTR foo, eax ; Store a DWORD in foo.
mov DWORD PTR foo+4, edx ; Store a DWORD in foo+4.
fld QWORD PTR foo ; Load a QWORD from foo.
Avoid
mov foo, eax
mov foo+4, edx
...
movq mm0, foo
Preferred
mov foo, eax
mov foo+4, edx
...
movd mm0, foo
punpckldq mm0, foo+4