Intel IA-32 Computer Accessories User Manual


 
Optimizing for SIMD Floating-point Applications 5
5-15
You may have to swizzle data in the registers, but not in memory. This
occurs when two different functions need to process the data in different
layout. In lighting, for example, data comes as
rrrr gggg bbbb aaaa,
and you must deswizzle them into
rgba before converting into integers.
In this case you use the
movlhps/movhlps instructions to do the first
part of the deswizzle followed by
shuffle instructions, see
Example 5-6 and Example 5-7.
unpcklps xmm5, xmm4 // xmm5= z1 w1 z2 w2
unpckhps xmm0, xmm4 // xmm0= z3 w3 z4 w4
movlps [edx+8], xmm5 // v1 = x1 y1 z1 w1
movhps [edx+24], xmm5 // v2 = x2 y2 z2 w2
movlps [edx+40], xmm0 // v3 = x3 y3 z3 w3
movhps [edx+56], xmm0 // v4 = x4 y4 z4 w4
// DESWIZZLING ENDS HERE
}
}
Example 5-6 Deswizzling Data Using the movlhps and shuffle
Instructions
void deswizzle_rgb(Vertex_soa *in, Vertex_aos *out)
{
//---deswizzle rgb---
// assume: xmm1=rrrr, xmm2=gggg, xmm3=bbbb, xmm4=aaaa
__asm {
mov ecx, in // load structure addresses
mov edx, out
movaps xmm1, [ecx] // load r1 r2 r3 r4 => xmm1
movaps xmm2, [ecx+16] // load g1 g2 g3 g4 => xmm2
movaps xmm3, [ecx+32] // load b1 b2 b3 b4 => xmm3
movaps xmm4, [ecx+48] // load a1 a2 a3 a4 => xmm4
continued
Example 5-5 Deswizzling Single-Precision SIMD Data (continued)