AMD 250 Computer Hardware User Manual


 
152 Scheduling Optimizations Chapter 7
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
Example
Avoid code that places a load whose address takes longer to calculate before a load whose address can
be determined more quickly:
add ebx, ecx ; Instruction 1
mov eax, DWORD PTR [10h] ; Instruction 2 (fast address calc.)
mov ecx, DWORD PTR [eax+ebx] ; Instruction 3 (slow address calc.)
mov edx, DWORD PTR [24h] ; This load is stalled from accessing the
; data cache due to the long latency
; caused by generating the address for
; instruction 3.
Where possible, reorder instructions so that loads with simpler address calculations come before
those with more complex address calculations:
add ebx, ecx ; Instruction 1
mov eax, DWORD PTR [10h] ; Instruction 2
mov edx, DWORD PTR [24h] ; Place load above instruction 3 to avoid
; address-generation interlock stall.
mov ecx, DWORD PTR [eax+ebx] ; Instruction 3