Intel IA-32 Computer Accessories User Manual


 
Coding for SIMD Architectures 3
3-27
Improving Memory Utilization
Memory performance can be improved by rearranging data and
algorithms for SSE 2, SSE, and MMX technology intrinsics. The
methods for improving memory performance involve working with the
following:
Data structure layout
Strip-mining for vectorization and memory utilization
Loop-blocking
Using the cacheability instructions, prefetch and streaming store, also
greatly enhance memory utilization. For these instructions, see
Chapter 6, “Optimizing Cache Usage.”
Data Structure Layout
For certain algorithms, like 3D transformations and lighting, there are
two basic ways of arranging the vertex data. The traditional method is
the array of structures (AoS) arrangement, with a structure for each
vertex (see Example 3-14). However this method does not take full
advantage of the SIMD technology capabilities.
The best processing method for code using SIMD technology is to
arrange the data in an array for each coordinate (see Example 3-15).
This data arrangement is called structure of arrays (SoA).
Example 3-14 AoS Data Structure
typedef struct{
float x,y,z;
int a,b,c;
. . .
} Vertex;
Vertex Vertices[NumOfVertices];