Optimizing for SIMD Floating-point Applications 5
5-5
For some applications, e.g., 3D geometry, the traditional data
arrangement requires some changes to fully utilize the SIMD registers
and parallel techniques. Traditionally, the data layout has been an array
of structures (AoS). To fully utilize the SIMD registers in such
applications, a new data layout has been proposed—a structure of arrays
(SoA) resulting in more optimized performance.
Vertical versus Horizontal Computation
The majority of the floating-point arithmetic instructions in SSE and
SSE2 are focused on vertical data processing of parallel data elements,
i.e., the destination of each element is the result of a common arithmetic
operation of the input operands in the same vertical position. This is
shown in the diagram below. To supplement these homogeneous
arithmetic operations on parallel data elements, SSE and SSE2 also
provides several data movement instruction (e.g., shufps) to facilitate
moving data elements horizontally.
The AoS data structure is often used in 3D geometry computations.
SIMD technology can be applied to AoS data structure using a
horizontal computation model. This means that the
x, y, z, and w
components of a single vertex structure (that is, of a single vector
Figure 5-1 Homogeneous Operation on Parallel Data Elements
X3
X2 X1 X0
Y3
Y2 Y1 Y0
X3 OP Y3 X2 OP Y2 X1 OP Y1 X0 OP Y0
OP
OP
OP
OP