Chapter 9 Optimizing with SIMD Instructions 227
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
pswapd mm1, mm0 ; MM1 = [r,i]
; Additionally, PSWAPD can be used with a 64-bit memory location. Suppose
; that EDI contains the address of two floats: r and i.
; INPUT:
; [EDI:EDI+8] = [b,a]
; OUTPUT:
; MM1 = [r,i]
pswapd mm1, [edi] ; MM1 = [r,i]
; PFPNACC
; Suppose that MM0 contains two floats: r1 * r2 (the product of the real parts
; of 2 complex numbers) and i1 * i2 (the product of the imaginary parts
; of 2 complex numbers).
; Also suppose that MM1 contains two floats: r1 * i2 (the product of the real
; part of the first complex number and the imaginary part of the second
; complex number) and i1 * r2 (the product of the imaginary part of the
; first complex number and the real part of the second complex number).
; INPUTS:
; MM0 = [i1*i2,r1*r2]
; MM1 = [i1*r2,r1*i2]
; OUTPUT:
; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]
pfpnacc mm0, mm1 ; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]
; Additionally, PSWAPD can be used with a 64-bit memory location. Suppose
; that EDI contains the address of two floats: r1 * i2 (the product of the
; real part of the first complex number and the imaginary part of the
; second complex number) and i1 * r2 (the product of the imaginary part of
; the first complex and the real part of the second complex number).
; INPUTS:
; MM0 = [i1*i2,r1*r2]
; [EDI:EDI+8] = [i1*r2,r1*i2]
; OUTPUT:
; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]
pfpnacc mm0, [edi] ; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]
The PFPNACC instruction is specifically designed for use in complex arithmetic operations.