Developer’s Manual March, 2003 B-5
Intel
®
80200 Processor based on Intel
®
XScale
™
Microarchitecture
Optimization Guide
B.2.2 Instruction Flow Through the Pipeline
The Intel
®
80200 processor pipeline issues a single instruction per clock cycle. Instruction
execution begins at the F1 pipestage and completes at the WB pipestage.
Although a single instruction may be issued per clock cycle, all three pipelines (MAC, memory,
and main execution) may be processing instructions simultaneously. If there are no data hazards,
then each instruction may complete independently of the others.
Each pipestage takes a single clock cycle or machine cycle to perform its subtask with the
exception of the MAC unit.
B.2.2.1. ARM* V5 Instruction Execution
Figure B-1 uses arrows to show the possible flow of instructions in the pipeline. Instruction
execution flows from the F1 pipestage to the RF pipestage. The RF pipestage may issue a single
instruction to either the X1 pipestage or the MAC unit (multiply instructions go to the MAC, while
all others continue to X1). This means that M1 or X1 is idle.
All load/store instructions are routed to the memory pipeline after the effective addresses have been
calculated in X1.
The ARM v5 bx (branch and exchange) instruction, which is used to branch between ARM and
THUMB code, causes the entire pipeline to be flushed (The bx instruction is not dynamically
predicted by the BTB). If the processor is in Thumb mode, then the ID pipestage dynamically
expands each Thumb instruction into a normal ARM v5 RISC instruction and execution resumes as
usual.
B.2.2.2. Pipeline Stalls
The progress of an instruction can stall anywhere in the pipeline. Several pipestages may stall for
various reasons. It is important to understand when and how hazards occur in the Intel
®
80200
processor pipeline. Performance degradation can be significant if care is not taken to minimize
pipeline stalls.