Multi-Core and Hyper-Threading Technology 7
7-5
terms of time of completion relative to the same task when in a
single-threaded environment) will vary, depending on how much shared
execution resources and memory are utilized.
For development purposes, several popular operating systems (for
example Microsoft Windows* XP Professional and Home, Linux*
distributions using kernel 2.4.19 or later
2
) include OS kernel code that
can manage the task scheduling and the balancing of shared execution
resources within each physical processor to maximize the throughput.
Because applications run independently under a multi-tasking
environment, thread synchronization issues are less likely to limit the
scaling of throughput. This is because the control flow of the workload
is likely to be 100% parallel
3
(if no inter-processor communication is
taking place and if there are no system bus constraints).
With a multi-tasking workload, however, bus activities and cache access
patterns are likely to affect the scaling of the throughput. Running two
copies of the same application or same suite of applications in a
lock-step can expose an artifact in performance measuring
methodology. This is because an access pattern to the 1st level data
cache can lead to excessive cache misses and produce skewed
performance results. Fix this problem by:
• including a per-instance offset at the start-up of an application
• introducing heterogeneity in the workload by using different
datasets with each instance of the application
• randomizing the sequence of start-up of applications when running
multiple copies of the same suite
2. This code is included in Red Hat* Linux Enterprise AS 2.1.
3. A software tool that attempts to measure the throughput of a multi-tasking workload is
likely to introduce additional control flows that are not parallel. For example, see
Example 7-2 for coding pitfalls using spin-wait loop. Thus, thread synchronization issues
must be considered as an integral part of its performance measuring methodology.