IA-32 Intel® Architecture Optimization
7-8
Functional Decomposition
Applications usually process a wide variety of tasks with diverse
functions and many unrelated data sets. For example, a video codec
needs several different processing functions. These include DCT,
motion estimation and color conversion. Using a functional threading
model, applications can program separate threads to do motion
estimation, color conversion, and other functional tasks.
Functional decomposition will achieve more flexible thread-level
parallelism if it is less dependent on the duplication of hardware
resources. For example, a thread executing a sorting algorithm and a
thread executing a matrix multiplication routine are not likely to require
the same execution unit at the same time. A design recognizing this
could advantage of traditional multiprocessor systems as well as
multiprocessor systems using IA-32 processor supporting
Hyper-Threading Technology.
Specialized Programming Models
Intel Core Duo processor offers a second-level cache shared by two
processor cores in the same physical package. This provides
opportunities for two application threads to access some application
data while minimizing the overhead of bus traffic.
Multi-threaded applications may need to employ specialized
programming models to take advantage of this type of hardware feature.
One scenario of these programming models is referred to as
“producer-consumer”, because one thread writes data into some
destination (hopefully in the second-level cache) and another thread
executing on the other core in the same physical package will
subsequently read the data produced by the first thread.
The basic approach for implementing a producer-consumer model is to
create two threads; one thread is the producer and the other is the
consumer. Typically, the producer and consumer take turns to work on a
buffer and inform each other when they are ready to exchange buffers.
In a producer-consumer model, there is some thread synchronization