IA-32 Intel® Architecture Optimization
5-10
To gather data from 4 different memory locations on the fly, follow
steps:
1. Identify the first half of the 128-bit memory location.
2. Group the different halves together using the
movlps and movhps to
form an
xyxy layout in two registers.
3. From the 4 attached halves, get the
xxxx by using one shuffle, the
yyyy by using another shuffle.
The
zzzz is derived the same way but only requires one shuffle.
Example 5-3 illustrates the swizzle function.
Example 5-3 Swizzling Data
typedef struct _VERTEX_AOS {
float x, y, z, color;
} Vertex_aos; // AoS structure declaration
typedef struct _VERTEX_SOA {
float x[4], float y[4], float z[4];
float color[4];
} Vertex_soa; // SoA structure declaration
void swizzle_asm (Vertex_aos *in, Vertex_soa *out)
{
// in mem: x1y1z1w1-x2y2z2w2-x3y3z3w3-x4y4z4w4-
// SWIZZLE XYZW --> XXXX
asm {
mov ecx, in // get structure addresses
mov edx, out
continued