AMD 250 Computer Hardware User Manual


 
Chapter 7 Scheduling Optimizations 155
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
Avoid an assembly-language equivalent like this, which uses base and displacement components (for
example, [esi+a]) to compute array-element addresses, requiring additional pointer arithmetic to
increment the offsets into the forward-traversed arrays:
mov ecx, MAXSIZE ; Initialize loop counter.
xor esi, esi ; Initialize offset into array a.
xor edi, edi ; Initialize offset into array b.
xor ebx, ebx ; Initialize offset into array c.
add_loop:
mov eax, [esi+a] ; Get element from a.
mov edx, [edi+b] ; Get element from b.
add eax, edx ; a[i] + b[i]
mov [ebx+c], eax ; Write result to c.
add esi, 4 ; Increment offset into a.
add edi, 4 ; Increment offset into b.
add ebx, 4 ; Increment offset into c.
dec ecx ; Decrement loop count
jnz add_loop ; until loop count is 0.
Instead, traverse the arrays in a downward direction (from higher to lower addresses), in order to take
advantage of scaled-index addressing (for example, [ecx*4+a]), which minimizes pointer arithmetic
within the loop:
mov ecx, MAXSIZE - 1 ; Initialize index.
add_loop:
mov eax, [ecx*4+a] ; Get element from a.
mov edx, [ecx*4+b] ; Get element from b.
add eax, edx ; a[i] + b[i]
mov [ecx*4+c], eax ; Write result to c.
dec ecx ; Decrement index
jns add_loop ; until index is negative.
A change in the direction of traversal is possible only if each loop iteration is completely independent
of the others. If you cannot change the direction of traversal for a given array, it is still possible to
minimize pointer arithmetic by using as a base address a displacement that points to the byte past the
end of the array, and using an index that starts with a negative value and reaches zero when the loop
expires:
mov ecx, (-MAXSIZE) ; Initialize index.
add_loop:
mov eax, [ecx*4+a+MAXSIZE*4] ; Get element from a.
mov edx, [ecx*4+b+MAXSIZE*4] ; Get element from b.
add eax, edx ; a[i] + b[i]
mov [ecx*4+c+MAXSIZE*4], eax ; Write result to c.
inc ecx ; Increment index
jnz add_loop ; until index is 0.