Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
7-40
User/Source Coding Rule 33. (H impact, M generality) Minimize the
sharing of data between threads that execute on different bus agents sharing a
common bus.
One technique to minimize sharing of data is to copy data to local stack
variables if it is to be accessed repeatedly over an extended period. If
necessary, results from multiple threads can be combined later by
writing them back to a shared memory location. This approach can also
minimize time spent to synchronize access to shared data.
Batched Producer-Consumer Model
The key benefit of a threaded producer-consumer design, shown in
Figure 7-5, is to minimize bus traffic while sharing data between the
producer and the consumer using a shared second-level cache. On an
Intel Core Duo processor and when the work buffers are small enough
to fit within the first-level cache, re-ordering of producer and consumer
tasks are necessary to achieve optimal performance. This is because
fetching data from L2 to L1 is much faster than having a cache line in
one core invalidated and fetched from the bus.
Figure 7-5 illustrates a batched producer-consumer model that can be
used to overcome the drawback of using small work buffers in a
standard producer-consumer model. In a batched producer-consumer
model, each scheduling quanta batches two or more producer tasks,
each producer working on a designated buffer. The number of tasks to
batch is determined by the criteria that the total working set be greater
than the first-level cache but smaller than the second-level cache.
Figure 7-5 Batched Approach of Producer Consumer Model
Main
Thread
P(2)
P(5)
P(4)
P(3)
C(3)C(2)C(1) C(4)
P(1)
P: producer
C: consumer
P(6)