Support User Manuals

AMD 250 Computer Hardware User Manual

Open as PDF

of 384

Chapter 9 Optimizing with SIMD Instructions 219

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

push ebp

mov ebp, esp

;==============================================================================

; Parameters passed into routine:

; [ebp+8] = ->a_and_b

; [ebp+12] = ->c_and_d

; [ebp+16] = ->aplusb_cplusd

;==============================================================================

push ebx

push esi

push edi

;==============================================================================

; THE 4 ASM LINES BELOW LOAD THE FUNCTION'S ARGUMENTS INTO GENERAL-PURPOSE

; REGISTERS (GPRS)

; esi = starting address of 2 floats "a_and_b"

; edi = starting address of 2 floats "c_and_d"

; eax = starting address of 2 floats "aplusb_cplusd"

;==============================================================================

mov esi, [ebp+8] ; esi = ->a_and_b

mov edi, [ebp+12] ; edi = ->c_and_d

mov eax, [ebp+16] ; eax = ->aplusb_cplusd

;==============================================================================

; ADD a AND b TOGETHER AND ALSO c AND d

;==============================================================================

emms

movq mm0, [esi] ; mm0 = [b,a]

movq mm1, [edi] ; mm1 = [d,c]

pfacc mm0, mm1 ; mm0 = [c+d,b+a]

;==============================================================================

; INSTRUCTIONS BELOW RESTORE THE REGISTER STATE WITH WHICH THIS ROUTINE

; WAS ENTERED

; REGISTERS (EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED)

; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM

pop edi

pop esi

pop ebx

mov esp,ebp

pop ebp

;==============================================================================

ret

_accumulate_3dnow ENDP

_TEXT ENDS

END

The same operation can be performed using SSE instructions, but the data in the XMM registers must

be rearranged. The next example loads four floating-point values into four XMM registers, XMM4–

XMM7, and then rearranges and adds the values so as to accumulate the sum of each XMM register

into a float in XMM1.

;----------------------------------------------------------------------

; The instructions below take the 4 floats in each XMM register below:

previous next