AMD 250 Computer Hardware User Manual


 
Chapter 9 Optimizing with SIMD Instructions 219
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
push ebp
mov ebp, esp
;==============================================================================
; Parameters passed into routine:
; [ebp+8] = ->a_and_b
; [ebp+12] = ->c_and_d
; [ebp+16] = ->aplusb_cplusd
;==============================================================================
push ebx
push esi
push edi
;==============================================================================
; THE 4 ASM LINES BELOW LOAD THE FUNCTION'S ARGUMENTS INTO GENERAL-PURPOSE
; REGISTERS (GPRS)
; esi = starting address of 2 floats "a_and_b"
; edi = starting address of 2 floats "c_and_d"
; eax = starting address of 2 floats "aplusb_cplusd"
;==============================================================================
mov esi, [ebp+8] ; esi = ->a_and_b
mov edi, [ebp+12] ; edi = ->c_and_d
mov eax, [ebp+16] ; eax = ->aplusb_cplusd
;==============================================================================
; ADD a AND b TOGETHER AND ALSO c AND d
;==============================================================================
emms
movq mm0, [esi] ; mm0 = [b,a]
movq mm1, [edi] ; mm1 = [d,c]
pfacc mm0, mm1 ; mm0 = [c+d,b+a]
;==============================================================================
; INSTRUCTIONS BELOW RESTORE THE REGISTER STATE WITH WHICH THIS ROUTINE
; WAS ENTERED
; REGISTERS (EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED)
; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM
pop edi
pop esi
pop ebx
mov esp,ebp
pop ebp
;==============================================================================
ret
_accumulate_3dnow ENDP
_TEXT ENDS
END
The same operation can be performed using SSE instructions, but the data in the XMM registers must
be rearranged. The next example loads four floating-point values into four XMM registers, XMM4–
XMM7, and then rearranges and adds the values so as to accumulate the sum of each XMM register
into a float in XMM1.
;----------------------------------------------------------------------
; The instructions below take the 4 floats in each XMM register below: