Support User Manuals

Intel IA-32 Computer Accessories User Manual

Open as PDF

of 568

Multi-Core and Hyper-Threading Technology 7

7-49

On Hyper-Threading-Technology-enabled processors, excessive loop

unrolling is likely to reduce the Trace Cache’s ability to deliver high

bandwidth μop streams to the execution engine.

Optimization for Code Size

When the Trace Cache is continuously and repeatedly delivering μop

traces that are pre-built, the scheduler in the execution engine can

dispatch μops for execution at a high rate and maximize the utilization

of available execution resources. Optimizing application code size by

organizing code sequences that are repeatedly executed into sections,

each with a footprint that can fit into the Trace Cache, can improve

application performance greatly.

On Hyper-Threading-Technology-enabled processors, multithreaded

applications should improve code locality of frequently executed

sections and target one half of the size of Trace Cache for each

application thread when considering code size optimization. If code size

becomes an issue affecting the efficiency of the front end, this may be

detected by evaluating performance metrics discussed in the previous

sub-section with respect to loop unrolling.

User/Source Coding Rule 38. (L impact, L generality) Optimize code size to

improve locality of Trace cache and increase delivered trace length.

Using Thread Affinities to Manage Shared Platform

Resources

Each logical processor in an MP system has unique initial APIC_ID

which can be queried using CPUID. Resources shared by more than one

logical processors in a multi-threading platform can be mapped into a

three-level hierarchy for a non-clustered MP system. Each of the three

levels can be identified by a label, which can be extracted from the

previous next