Multi-Core and Hyper-Threading Technology 7
7-49
On Hyper-Threading-Technology-enabled processors, excessive loop
unrolling is likely to reduce the Trace Cache’s ability to deliver high
bandwidth μop streams to the execution engine.
Optimization for Code Size
When the Trace Cache is continuously and repeatedly delivering μop
traces that are pre-built, the scheduler in the execution engine can
dispatch μops for execution at a high rate and maximize the utilization
of available execution resources. Optimizing application code size by
organizing code sequences that are repeatedly executed into sections,
each with a footprint that can fit into the Trace Cache, can improve
application performance greatly.
On Hyper-Threading-Technology-enabled processors, multithreaded
applications should improve code locality of frequently executed
sections and target one half of the size of Trace Cache for each
application thread when considering code size optimization. If code size
becomes an issue affecting the efficiency of the front end, this may be
detected by evaluating performance metrics discussed in the previous
sub-section with respect to loop unrolling.
User/Source Coding Rule 38. (L impact, L generality) Optimize code size to
improve locality of Trace cache and increase delivered trace length.
Using Thread Affinities to Manage Shared Platform
Resources
Each logical processor in an MP system has unique initial APIC_ID
which can be queried using CPUID. Resources shared by more than one
logical processors in a multi-threading platform can be mapped into a
three-level hierarchy for a non-clustered MP system. Each of the three
levels can be identified by a label, which can be extracted from the