AMD 250 Computer Hardware User Manual


 
216 Optimizing with SIMD Instructions Chapter 9
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
9.13 Clearing MMX™ and XMM Registers with XOR
Instructions
Optimization
Use instructions that perform XOR operations (PXOR, XORPS, and XORPD) to clear all the bits in
MMX and XMM registers.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
The latency of the MMX XOR instruction (PXOR) is only 3 cycles and comparable to the 3 cycles
required to load data, assuming it is in the L1 data cache. The SSE and SSE2 XOR instructions
(XORPS and XORPD, respectively) also have latencies of 3 cycles.
Examples
The following examples illustrate how to clear the bits in a register using the different exclusive-OR
instructions:
; MMX
pxor mm0, mm0 ; Clear the MM0 register.
; SSE
xorps xmm0, xmm0 ; Clear the XMM0 register.
; SSE2
xorpd xmm0, xmm0 ; Clear the XMM0 register.