Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
7-4
When optimizing application performance in a multithreaded
environment, control flow parallelism is likely to have the largest
impact on performance scaling with respect to the number of physical
processors and to the number of logical processors per physical
processor.
If the control flow of a multi-threaded application contains a workload
in which only 50% can be executed in parallel, the maximum
performance gain using two physical processors is only 33%, compared
to using a single processor. Using four processors can deliver no more
than a 60% speed-up over a single processor! Thus, it is critical to
maximize the portion of control flow that can take advantage of
parallelism. Improper implementation of thread synchronization can
significantly increase the proportion of serial control flow and further
reduce the application’s performance scaling.
In addition to maximizing the parallelism of control flows, interaction
between threads in the form of thread synchronization and imbalance of
task scheduling can also impact overall processor scaling significantly.
Excessive cache misses are one cause of poor performance scaling. In a
multithreaded execution environment, they can occur from:
aliased stack accesses by different threads in the same process
thread contentions resulting in cache line evictions
false-sharing of cache lines between different processors
Techniques that address each of these situations (and many other areas)
are described in sections in this chapter.
Multitasking Environment
Hardware multi-threading capabilities in IA-32 processors can exploit
task-level parallelism when a workload consists of several
single-threaded applications and these applications are scheduled to run
concurrently under an MP-aware operating system. In this environment,
hardware multi-threading capabilities can deliver higher throughput for
the workload, although the relative performance of a single task (in