B-2 March, 2003 Developer’s Manual
Intel
®
80200 Processor based on Intel
®
XScale
™
Microarchitecture
Optimization Guide
B.2 Intel
®
80200 Processor Pipeline
One of the biggest differences between the Intel
®
80200 processor and first-generation Intel
®
StrongARM* processors is the pipeline. Many of the differences are summarized in Figure B-1.
This section provides a brief description of the structure and behavior of the Intel
®
80200
processor pipeline.
B.2.1 General Pipeline Characteristics
While the Intel
®
80200 processor pipeline is scalar and single issue, instructions may occupy all
three pipelines at once. Out of order completion is possible. The following sections discuss general
pipeline characteristics.
B.2.1.1. Number of Pipeline Stages
The Intel
®
80200 processor has a longer pipeline (7 stages versus 5 stages) which operates at a
much higher frequency than its predecessors do. This allows for greater overall performance. The
longer the Intel
®
80200 processor pipeline has several negative consequences, however:
• Larger branch misprediction penalty (4 cycles in the Intel
®
80200 processor instead of 1 in
Intel
®
StrongARM*). This is mitigated by dynamic branch prediction.
• Larger load use delay (LUD) - LUDs arise from load-use dependencies. A load-use
dependency gives rise to a LUD if the result of the load instruction cannot be made available
by the pipeline in due time for the subsequent instruction. An optimizing compiler should find
independent instructions to fill the slot following the load.
• Certain instructions incur a few extra cycles of delay on the Intel
®
80200 processor as
compared to first generation Intel
®
StrongARM* processors (LDM, STM).
• Decode and register file lookups are spread out over 2 cycles in the Intel
®
80200 processor,
instead of 1 cycle in predecessors.