IA-32 Intel® Architecture Optimization
1-38
Microarchitecture Pipeline and Hyper-Threading Technology
This section describes the HT Technology microarchitecture and how
instructions from the two logical processors are handled between the
front end and the back end of the pipeline.
Although instructions originating from two programs or two threads
execute simultaneously and not necessarily in program order in the
execution core and memory hierarchy, the front end and back end
contain several selection points to select between instructions from the
two logical processors. All selection points alternate between the two
logical processors unless one logical processor cannot make use of a
pipeline stage. In this case, the other logical processor has full use of
every cycle of the pipeline stage. Reasons why a logical processor may
not use a pipeline stage include cache misses, branch mispredictions,
and instruction dependencies.
Front End Pipeline
The execution trace cache is shared between two logical processors.
Execution trace cache access is arbitrated by the two logical processors
every clock. If a cache line is fetched for one logical processor in one
clock cycle, the next clock cycle a line would be fetched for the other
logical processor provided that both logical processors are requesting
access to the trace cache.
If one logical processor is stalled or is unable to use the execution trace
cache, the other logical processor can use the full bandwidth of the trace
cache until the initial logical processor’s instruction fetches return from
the L2 cache.
After fetching the instructions and building traces of µops, the µops are
placed in a queue. This queue decouples the execution trace cache from
the register rename pipeline stage. As described earlier, if both logical
processors are active, the queue is partitioned so that both logical
processors can make independent forward progress.