IA-32 Intel® Architecture Optimization
7-58
Optimization of Other Shared Resources
Resource optimization in multi-threaded application depends on the
cache topology and execution resources associated within the hierarchy
of processor topology. Processor topology and an algorithm for software
to identify the processor topology are discussed in the IA-32 Intel®
Architecture Software Developer’s Manual, Volume 3A.
Typically the bus system is shared by multiple agents at the SMT level
and at the processor core level of the processor topology. Thus
multi-threaded application design should start with an approach to
manage the bus bandwidth available to multiple processor agents
sharing the same bus link in an equitable manner. This can be done by
improving the data locality of an individual application thread or
allowing two threads to take advantage of a shared second-level cache
(where such shared cache topology is available).
In general, optimizing the building blocks of a multi-threaded
application can start from an individual thread. The guidelines discussed
in Chapters 2 through 6 largely apply to multi-threaded optimization.
Tuning Suggestion 3. (H Impact, H Generality) Optimize single threaded
code to maximize execution throughput first.
At the SMT level, Hyper-Threading Technology typically can provide
two logical processors sharing execution resources within a processor
core. To help multithreaded applications utilize shared execution
resources effectively, the rest of this section describes guidelines to deal
with common situations as well as those limited situations where
execution resource utilization between threads may impact overall
performance.
Most applications only use about 20-30% of peak execution resources
when running in a single-threaded environment. A useful indicator that
relates to this is by measuring the execution throughput at the retirement
stage (See “Workload Characterization” in Appendix A). In a processor
that supports Hyper-Threading Technology, execution throughput