Chapter 9 Optimizing with SIMD Instructions 233
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
; REGISTERS EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED
; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
;==============================================================================
ret
_matrix_x_vector_sse ENDP
_TEXT ENDS
END
To greatly enhance performance, the previous function can perform the matrix multiplication not only
upon one four-column vector, but upon many. Creating a separate function to transform a single
vertex and repeatedly calling the function is prohibitively expensive because of the overhead in
pushing and popping registers from the stack. This applies to routines that negate a single vector,
nullify a single vector, and add two vectors. Listing 28 is the 3DNow! technology counterpart to
Listing 27 on page 231.
Listing 28. 4 × 4 Matrix Multiplication (3DNow!™ Technology)
; matrix_x_vector_3dnow(float *trR, float *v, int num_vertices_to_rotate,
float *rotv);
;
; TO ASSEMBLE INTO *.obj DO THE FOLLOWING:
; ml.exe -coff -c matrix_x_vector_3dnow.asm
;
.586
.K3D
.XMM
_TEXT SEGMENT
PUBLIC _matrix_x_vector_3dnow
_matrix_x_vector_3dnow PROC NEAR
;==============================================================================
; INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS
; ENTERED.
; REGISTERS EAX, ECX, AND EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED,
; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM
push ebp
mov ebp, esp
;==============================================================================
; Parameters passed into routine:
; [ebp+8] = ->trR
; [ebp+12] = ->v
; [ebp+16] = num_vertices_to_rotate
; [ebp+20] = ->rotv
;==============================================================================
push ebx
push esi
push edi
;===============================================================================