Intel IA-32 Computer Accessories User Manual


 
IA-32 Intel® Architecture Processor Family Overview
1-15
Branch Prediction
Branch prediction is important to the performance of a deeply pipelined
processor. It enables the processor to begin executing instructions long
before the branch outcome is certain. Branch delay is the penalty that is
incurred in the absence of correct prediction. For Pentium 4 and Intel
Xeon processors, the branch delay for a correctly predicted instruction
can be as few as zero clock cycles. The branch delay for a mispredicted
branch can be many cycles, usually equivalent to the pipeline depth.
Branch prediction in the Intel NetBurst microarchitecture predicts all
near branches (conditional calls, unconditional calls, returns and
indirect branches). It does not predict far transfers (far calls, irets and
software interrupts).
Mechanisms have been implemented to aid in predicting branches
accurately and to reduce the cost of taken branches. These include:
the ability to dynamically predict the direction and target of
branches based on an instruction’s linear address, using the branch
target buffer (BTB)
if no dynamic prediction is available or if it is invalid, the ability to
statically predict the outcome based on the offset of the target: a
backward branch is predicted to be taken, a forward branch is
predicted to be not taken
the ability to predict return addresses using the 16-entry return
address stack
the ability to build a trace of instructions across predicted taken
branches to avoid branch penalties.
The Static Predictor. Once a branch instruction is decoded, the
direction of the branch (forward or backward) is known. If there was no
valid entry in the BTB for the branch, the static predictor makes a
prediction based on the direction of the branch. The static prediction
mechanism predicts backward conditional branches (those with
negative displacement, such as loop-closing branches) as taken.
Forward branches are predicted not taken.