Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Optimization
1-14
a mechanism fetches data only and includes two distinct
components: (1) a hardware mechanism to fetch the adjacent cache
line within an 128-byte sector that contains the data needed due to a
cache line miss, this is also referred to as adjacent cache line
prefetch (2) a software controlled mechanism that fetches data into
the caches using the prefetch instructions.
The hardware instruction fetcher reads instructions along the path
predicted by the branch target buffer (BTB) into instruction streaming
buffers. Data is read in 32-byte chunks starting at the target address. The
second and third mechanisms are described later.
Decoder
The front end of the Intel NetBurst microarchitecture has a single
decoder that decodes instructions at the maximum rate of one
instruction per clock. Some complex instructions must enlist the help of
the microcode ROM. The decoder operation is connected to the
execution trace cache.
Execution Trace Cache
The execution trace cache (TC) is the primary instruction cache in the
Intel NetBurst microarchitecture. The TC stores decoded IA-32
instructions (µops).
In the Pentium 4 processor implementation, TC can hold up to 12K
µops and can deliver up to three µops per cycle. TC does not hold all of
the µops that need to be executed in the execution core. In some
situations, the execution core may need to execute a microcode flow
instead of the µop traces that are stored in the trace cache.
The Pentium 4 processor is optimized so that most frequently-executed
IA-32 instructions come from the trace cache while only a few
instructions involve the microcode ROM.