Optimizing Cache Usage 6
6-55
• Determine multi-threading resource topology in an MP system (See
Section 7.10 of IA-32 Intel® Architecture Software Developer’s
Manual, Volume 3A).
• Determine cache hierarchy topology in a platform using multi-core
processors (See Example 7-13).
• Manage threads and processor affinities.
• Determine prefetch stride.
The size of a given level of cache is given by (# of Ways) * (Partitions)
* (Line_size) * (Sets)
= (EBX[31:22] + 1) * (EBX[21:12] + 1) * (EBX[11:0] + 1) * (ECX + 1)
Cache Sharing Using Deterministic Cache Parameters
Improving cache locality is an important part of software optimization.
For example a cache blocking algorithm can be designed to optimize its
block size at runtime for single-processor and a variety of
multi-processor execution environments including processors
supporting Hyper-Threading Technology (HT), or multi-core
processors.
The basic technique is to place an upper limit of the blocksize to be less
than the size of the target cache level divided by the number of logical
processors serviced by the target level of cache. This technique is
applicable to multithreaded application programming, and can benefit
single-threaded applications that are part of a multi-tasking workloads.
Cache Sharing in Single-core or Multi-core
The deterministic cache parameters is useful for managing shared cache
hierarchy in multithreaded applications for more sophisticated
situations. A given cache level may be shared by logical processors in a
processor or it may be implemented to be shared by logical processors
in a physical processor package. Using deterministic cache parameter
leaf and local APIC_ID associated with each logical processor in the