A Detailed Look Inside the Intel
®
NetBurst
™
Micro-Architecture of the Intel Pentium
®
4 Processor
Page 9
Intel
®
NetBurst™ Micro-architecture
The Pentium
®
4 processor is the first hardware implementation of a new micro-architecture, the Intel NetBurst
micro-architecture. To help reader understand this new micro-architecture, this section examines in detail the
following:
§ the design considerations the Intel NetBurst micro-architecture
§ the building blocks that make up this new micro-architecture
§ the operation of key functional units of this micro-architecture based on the implementation in the Pentium 4
processor.
The Intel NetBurst micro-architecture is designed to achieve high performance for both integer and floating-point
computations at very high clock rates. It has the following features:
§ hyper pipelined technology to enable high clock rates and frequency headroom to well above 1GHz
§ rapid execution engine to reduce the latency of basic integer instructions
§ high-performance, quad-pumped bus interface to the 400 MHz Intel NetBurst micro-architecture system bus.
§ execution trace cache to shorten branch delays
§ cache line sizes of 64 and 128 bytes
§ hardware prefetch
§ aggressive branch prediction to minimize pipeline delays
§ out-of-order speculative execution to enable parallelism
§ superscalar issue to enable parallelism
§ hardware register renaming to avoid register name space limitations
The Design Considerations of the Intel
®
NetBurst
TM
Micro-architecture
The design goals of Intel NetBurst micro-architecture are: (a) to execute both the legacy IA-32 code and applications
based on single-instruction, multiple-data (SIMD) technology at high processing rates; (b) to operate at high clock
rates, and to scale to higher performance and clock rates in the future. To accomplish these design goals, the Intel
NetBurst micro-architecture has many advanced features and improvements over the Pentium Pro processor micro-
architecture.
The major design considerations of the Intel NetBurst micro-architecture to enable high performance and highly
scalable clock rates are as follows:
§ It uses a deeply pipelined design to enable high clock rates with different parts of the chip running at different
clock rates, some faster and some slower than the nominally-quoted clock frequency of the processor. The
Intel NetBurst micro-architecture allows the Pentium 4 processor to achieve significantly higher clock rates as
compared with the Pentium III processor. These clock rates will achieve well above 1 GHz.
§ Its pipeline provides high performance by optimizing for the common case of frequently executed
instructions. This means that the most frequently executed instructions in common circumstances (such as a
cache hit) are decoded efficiently and executed with short latencies, such that frequently encountered code
sequences are processed with high throughput.
§ It employs many techniques to hide stall penalties. Among these are parallel execution, buffering, and
speculation. Furthermore, the Intel NetBurst micro-architecture executes instructions dynamically and out-or-
order, so the time it takes to execute each individual instruction is not always deterministic. Performance of a
particular code sequence may vary depending on the state the machine was in when that code sequence was
entered.