AMD 250 Computer Hardware User Manual


 
Chapter 9 Optimizing with SIMD Instructions 217
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
9.14 Finding the Floating-Point Absolute Value of
Operands of SSE, SSE2, and 3DNow!™
Instructions
Optimization
Use instructions that perform AND operations (PAND, ANDPS, and ANDPD) to determine the
absolute value of floating-point operands of SSE, SSE2, and 3DNow!instructions.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
The MMX PAND instruction has a latency of 2 cycles, whereas the SSE and SSE2 AND instructions
(ANDPS and ANDPD, respectively) have latencies of 3 cycles. The following examples illustrate
how to clear the sign bits:
; 3DNow!
absmask DQ 7FFFFFFF7FFFFFFFh
pand mm0, [absmask] ; Clear the sign bits of both floats in MM0.
; SSE
absmask DQ 7FFFFFFF7FFFFFFFh,7FFFFFFF7FFFFFFFh
andps xmm0, [absmask] ; Clear the sign bits of all four floats in XMM0.
; SSE2
absmask DQ 7FFFFFFFFFFFFFFFh,7FFFFFFFFFFFFFFFh
andpd xmm0, [absmask] ; Clear the sign bits of both doubles in XMM0.