IA-32 Intel® Architecture Optimization
5-24
instructions to perform multiplications of single-precision complex
numbers. Example 5-12 demonstrates using SSE3 instructions to
perform division of complex numbers.
In both of these examples, the complex numbers are store in arrays of
structures. The MOVSLDUP, MOVSHDUP and the asymmetric
ADDSUBPS instructions allow performing complex arithmetics on two
pair of single-precision complex number simultaneously and without
any unnecessary swizzling between data elements. The coding
technique demonstrated in these two examples can be easily extended to
perform complex arithmetics on double-precision complex numbers. In
the case of double-precision complex arithmetics, multiplication or
divisions is done on one pair of complex numbers at a time.
Example 5-11 Multiplication of Two Pair of Single-precision Complex Number
// Multiplication of (ak + i bk ) * (ck + i dk )
// a + i b can be stored as a data structure
movsldup xmm0, Src1; load real parts into the destination,
; a1, a1, a0, a0
movaps xmm1, src2; load the 2nd pair of complex values,
; i.e. d1, c1, d0, c0
mulps xmm0, xmm1; temporary results, a1d1, a1c1, a0d0,
; a0c0
shufps xmm1, xmm1, b1; reorder the real and imaginary
; parts, c1, d1, c0, d0
movshdup xmm2, Src1; load the imaginary parts into the
; destination, b1, b1, b0, b0
mulps xmm2, xmm1; temporary results, b1c1, b1d1, b0c0,
; b0d0
addsubps xmm0, xmm2; b1c1+a1d1, a1c1 -b1d1, b0c0+a0d0,
; a0c0-b0d0