Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
7-44
Per-thread Stack Offset
To prevent private stack accesses in concurrent threads from thrashing
the first-level data cache, an application can use a per-thread stack offset
for each of its threads. The size of these offsets should be multiples of a
common base offset. The optimum choice of this common base offset
may depend on the memory access characteristics of the threads; but it
should be multiples of 128 bytes.
One effective technique for choosing a per-thread stack offset in an
application is to add an equal amount of stack offset each time a new
thread is created in a thread pool.
7
Example 7-9 shows a code fragment
that implements per-thread stack offset for three threads using a
reference offset of 1024 bytes.
User/Source Coding Rule 35. (H impact, M generality) Adjust the private
stack of each thread in an application so that the spacing between these stacks
is not offset by multiples of 64 KB or 1 MB to prevent unnecessary cache line
evictions (when using IA-32 processors supporting Hyper-Threading
Technology).
7. For parallel applications written to run with OpenMP, the OpenMP runtime library in
Intel KAP/Pro Toolset automatically provides the stack offset adjustment for each thread.