General Optimization Guidelines 2
2-41
However, if the access pattern of the array exhibits locality, such as if
the array index is being swept through, then the Pentium 4 processor
prefetches data from
struct_of_array, even if the elements of the
structure are accessed together.
When the elements of the structure are not accessed with equal
frequency, such as when element
a is accessed ten times more often than
the other entries, then
struct_of_array not only saves memory, but it
also prevents fetching unnecessary data items
b, c, d, and e.
Using
struct_of_array also enables the use of the SIMD data types by
the programmer and the compiler.
Note that
struct_of_array can have the disadvantage of requiring
more independent memory stream references. This can require the use
of more prefetches and additional address generation calculations. It can
also have an impact on DRAM page access efficiency. An alternative,
hybrid_struct_of_array blends the two approaches. In this case, only
2 separate address streams are generated and referenced: 1 for
hybrid_struct_of_array_ace and 1 for
hybrid_struct_of_array_bd. The second alterative also prevents
fetching unnecessary data (assuming the variables
a, c and e are always
used together; whereas the variables
b and d would be also used
together, but not at the same time as
a, c and e).
The hybrid approach ensures:
• simpler/fewer address generation than struct_of_array
• fewer streams, which reduces DRAM page misses
• use of fewer prefetches due to fewer streams
• efficient cache line packing of data elements that are used
concurrently
Assembly/Compiler Coding Rule 23. (H impact, M generality) Try to
arrange data structures such that they permit sequential access.
If the data is arranged into set of streams, the automatic hardware
prefetcher can prefetch data that will be needed by the application,
reducing the effective memory latency. If the data is accessed in a