Intel NetBurst Computer Hardware User Manual

Open as PDF

of 17

A Detailed Look Inside the Intel

NetBurst

™

Micro-Architecture of the Intel Pentium

4 Processor

Page 14

§ selecting IA-32 instructions that can be decoded into less than 4 µops and/or have short latencies

§ ordering IA-32 instructions to preserve available parallelism by minimizing long dependence chains and

covering long instruction latencies

§ ordering instructions so that their operands are ready and their corresponding issue ports and execution units

are free when they reach the scheduler.

This subsection describes port restrictions, result latencies, and issue latencies (also referred to as throughput) that

form the basis for that ordering. Scheduling affects the way that instructions are presented to the core of the

processor, but it is the execution core that reacts to an ever-changing machine state, reordering instructions for faster

execution or delaying them because of dependence and resource constraints. Thus the ordering of instructions is

more of a suggestion to the hardware.

The Intel® Pentium® 4 Processor Optimization Reference Manual lists the IA-32 instructions with their latency,

their issue throughput, and in relevant cases, the associated execution units. Some execution units are not pipelined,

such that µops cannot be dispatched in consecutive cycles and the throughput is less than one per cycle.

The number of µops associated with each instruction provides a basis for selecting which instructions to generate. In

particular, µops which are executed out of the microcode ROM involve extra overhead. For the Pentium II and

Pentium III processors, optimizing the performance of the decoder, which includes paying attention to the 4-1-1

sequence (instruction with four µops followed by two instructions each with one µop) and taking into account the

number of µops for each IA-32 instruction, was very important. On the Pentium 4 processor, the decoder template is

not an issue. Therefore it is no longer necessary to use a detailed list of exact µop count for IA-32 instructions.

Commonly used IA-32 instructions, which consist of four or less µops, are provided in the Intel

Pentium

Processor Optimization Reference Manual to aid instruction selection.

Execution Units and Issue Ports

Each cycle, the core may dispatch µops to one or more of the four issue ports. At the micro-architectural level, store

operations are further divided into two parts: store data and store address operations. The four ports through which

µops are dispatched to various execution units and to perform load and store operations are shown in Figure 4. Some

ports can dispatch two µops per clock because the execution unit for that µop executes at twice the speed, and those

execution units are marked “Double speed.”

Port 0. In the first half of the cycle,

port 0 can dispatch either one

floating-point move µop (including

floating-point stack move, floating-

point exchange or floating-point

store data), or one arithmetic

logical unit (ALU) µop (including

arithmetic, logic or store data). In

the second half of the cycle, it can

dispatch one similar ALU µop.

Port 1. In the first half of the cycle,

port 1 can dispatch either one

floating-point execution (all

floating-point operations except

moves, all SIMD operations) µop

or normal-speed integer (multiply,

shift and rotate) µop, or one ALU

(arithmetic, logic or branch) µop.

In the second half of the cycle, it can dispatch one similar ALU µop.

Port 2. Port 2 supports the dispatch of one load operation per cycle.

Note:

FP_ADD refers to x87 FP, and SIMD FP add and subtract operations

FP_MUL refers to x87 FP, and SIMD FP multiply operations

FP_DIV refers to x87 FP, and SIMD FP divide and square-root operations

MMX_ALU refers to SIMD integer arithmetic and logic operations

MMX_SHFT handles Shift, Rotate, Shuffle, Pack and Unpack operations

MMX_MISC handles SIMD reciprocal and some integer operations

Figure 4 Execution Units and Ports of the Out-of-order Core

Memory

Store

Address

Port 3

Memory

Load

All Loads

LEA

Prefetch

Port 2

ALU 1

Double

speed

ADD/SUB

Integer

Operation

Normal

speed

Shift/Rotate

Port 1

Execute

FP_ADD

FP_MUL

FP_DIV

FP_MISC

MMX_SHFT

MMX_ALU

MMX_MISC

Port 0

FP Move

FP Store Data

FXCH

ALU 0

Double

speed

ADD/SUB

Logic

Store Data

Branches

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

Intel NetBurst Computer Hardware User Manual