xviii
Example 6-12 Memory Copy Using Hardware Prefetch and Bus Segmentation..6-50
Example 7-1 Serial Execution of Producer and Consumer Work Items ...............7-9
Example 7-2 Basic Structure of Implementing Producer Consumer Threads....7-11
Example 7-3 Thread Function for an Interlaced Producer Consumer Model .....7-13
Example 7-4 Spin-wait Loop and PAUSE Instructions........................................7-24
Example 7-5 Coding Pitfall using Spin Wait Loop ..............................................7-29
Example 7-6 Placement of Synchronization and Regular Variables ..................7-32
Example 7-7 Declaring Synchronization Variables without Sharing
a Cache Line .................................................................................7-32
Example 7-8 Batched Implementation of the Producer Consumer Threads ......7-41
Example 7-9 Adding an Offset to the Stack Pointer of Three Threads...............7-45
Example 7-10 Adding a Pseudo-random Offset to the Stack Pointer
in the Entry Function .....................................................................7-47
Example 7-11 Assembling 3-level IDs, Affinity Masks for Each Logical
Processor ......................................................................................7-51
Example 7-12 Assembling a Look up Table to Manage Affinity Masks
and Schedule Threads to Each Core First ....................................7-54
Example 7-13 Discovering the Affinity Masks for Sibling Logical
Processors Sharing the Same Cache ...........................................7-55
Example D-1 Aligned esp-Based Stack Frames .................................................. D-5
Example D-2 Aligned ebp-based Stack Frames................................................... D-7
Example E-1 Calculating Insertion for Scheduling Distance of 3 ..........................E-3