Technical Reference Guide
Figure 3-1 illustrates the internal architecture of the Pentium 4 processor.
Out-of-
Order Core
Execution
Trace Cache
Branch
Prediction
Rapid Exe. Eng.
ALUs
FSB
I/F
256-KB
8-Way
L2
Adv.
Transfer
Cache
L1
Data
Cache
128-bit
Integer
FPU
CPU
Pentium 4 Processor
ALU Speed: Core speed x2
Core Speed: 1.4, 1.5, 2.0, 2.2 GHz
FSB Speed: 400 MHz (effective data transfer rate)
Figure 3–2. Pentium 4 Processor Internal Architecture
The Pentium 4 increases processing speed with higher clock speeds made possible with hyper-
pipelined technology that can handle significantly more instructions at a time. Since branch mis-
predicts would result in serious performance hits with such a long pipeline, the Pentium 4 features
a branch prediction mechanism improved with the addition of an execution trace cache and a
refined prediction algorithm. The execution trace cache can store 12k micro-ops (decoded
instructions dealing with branching sequences) that are checked when re-occurring branches are
processed. Code that is not executed (bypassed) is no longer stored in the L1 cache as was the
case in the Pentium III.
The out-of-order core features Advanced Dynamic Execution, which provides a large window
(126 instructions) for execution units to work with. A more accurate branch prediction algorithm,
along with a larger (4-KB) branch target buffer that stores more details on branch history results
in a 33% reduction in branch mis-predictions over the Pentium III.
The L1 data cache features a low-latency design for minimum response to cache hits. The 256-KB
advanced transfer L2 cache features a 256-bit (32-byte) interface operating at processing speed.
The L2 cache of the 1.5 GHz Pentium 4 can therefore provide a transfer rate of 48 GB/s.
The combined improvements of the Pentium 4’s CPU core the rapid execution engine’s ALUs to
operate at twice the processing frequency to handle the steady stream of instructions.
The front side bus (FSB) of the Pentium 4 uses a 100-MHz clock but provides bi- and quad-
pumped transfers through the use of 200- and 400-MHz strobes. The Pentium 4 can transfer a
complete 64-byte cache line in two 100-MHz bus cycles for a throughput rate of 3.2 GB/s.
Address information is transferred at a 200-MHz rate.
Compaq Evo and Workstation Personal Computer
Featuring the Intel Pentium 4 Processor
Second Edition - January 2003
3-3