Multi-Core and Hyper-Threading Technology 7
7-33
• In managed environments that provide automatic object allocation,
the object allocators and garbage collectors are responsible for
layout of the objects in memory so that false sharing through two
objects does not happen.
• Provide classes such that only one thread writes to each object field
and close object fields, in order to avoid false sharing.
One should not equate the recommendations discussed in this section as
favoring a sparsely populated data layout. The data-layout
recommendations should be adopted when necessary and avoid
unnecessary bloat in the size of the work set.
System Bus Optimization
The system bus services requests from bus agents (e.g. logical
processors) to fetch data or code from the memory sub-system. The
performance impact due data traffic fetched from memory depends on
the characteristics of the workload, and the degree of software
optimization on memory access, locality enhancements implemented in
the software code. A number of techniques to characterize memory
traffic of a workload is discussed in “Application Performance Tools” in
Appendix A. Optimization guidelines on locality enhancement is also
discussed in “Locality Enhancement” in Chapter 2 and “Hardware
Prefetching and Cache Blocking Techniques” in Chapter 6.
The techniques described in Chapter 2 and Chapter 6 benefit
application performance in a platform where the bus system is servicing
a single-threaded environment. In a multi-threaded environment, the bus
system typically services many more logical processors, each of which
can issue bus requests independently. Thus, techniques on locality
enhancements, conserving bus bandwidth, reducing
large-stride-cache-miss-delay can have strong impact on processor
scaling performance.