AMD 250 Manual

A SERVICE OF

next previous

Chapter 9 Optimizing with SIMD Instructions 233

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

; REGISTERS EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED

; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM

pop edi

pop esi

pop ebx

mov esp, ebp

pop ebp

;==============================================================================

ret

_matrix_x_vector_sse ENDP

_TEXT ENDS

END

To greatly enhance performance, the previous function can perform the matrix multiplication not only

upon one four-column vector, but upon many. Creating a separate function to transform a single

vertex and repeatedly calling the function is prohibitively expensive because of the overhead in

pushing and popping registers from the stack. This applies to routines that negate a single vector,

nullify a single vector, and add two vectors. Listing 28 is the 3DNow! technology counterpart to

Listing 27 on page 231.

Listing 28. 4 × 4 Matrix Multiplication (3DNow!™ Technology)

; matrix_x_vector_3dnow(float *trR, float *v, int num_vertices_to_rotate,

float *rotv);

;

; TO ASSEMBLE INTO *.obj DO THE FOLLOWING:

; ml.exe -coff -c matrix_x_vector_3dnow.asm

;

.586

.K3D

.XMM

_TEXT SEGMENT

PUBLIC _matrix_x_vector_3dnow

_matrix_x_vector_3dnow PROC NEAR

;==============================================================================

; INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS

; ENTERED.

; REGISTERS EAX, ECX, AND EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED,

; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM

push ebp

mov ebp, esp

;==============================================================================

; Parameters passed into routine:

; [ebp+8] = ->trR

; [ebp+12] = ->v

; [ebp+16] = num_vertices_to_rotate

; [ebp+20] = ->rotv

;==============================================================================

push ebx

push esi

push edi

;===============================================================================