Intel
®
IXP42X product line and IXC1100 control plane processors—Intel XScale
®
Processor
Intel
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
160 Order Number: 252480-006US
Maximum Interrupt Latency can be reduced by:
• Ensuring that the interrupt vector and interrupt service routine are resident in the
instruction cache. This can be accomplished by locking them down into the cache.
• Removing or reducing the occurrences of hardware page table walks. This also can
be accomplished by locking down the application’s page table entries into the TLBs,
along with the page table entry for the interrupt service routine.
3.9.2 Branch Prediction
The IXP42X product line and IXC1100 control plane processors implement dynamic
branch prediction for the ARM instructions B and BL and for the thumb instruction B.
Any instruction that specifies the PC as the destination is predicted as not taken. For
example, an LDR or a MOV that loads or moves directly to the PC will be predicted not
taken and incur a branch latency penalty.
These instructions — ARM B, ARM BL and thumb B -- enter into the branch target
buffer when they are “taken” for the first time. (A “taken” branch refers to when they
are evaluated to be true.) Once in the branch target buffer, IXP42X product line and
IXC1100 control plane processors dynamically predict the outcome of these
instructions based on previous outcomes. Table 76 shows the branch latency penalty
when these instructions are correctly predicted and when they are not. A penalty of
zero for correct prediction means that the IXP42X product line and IXC1100 control
plane processors can execute the next instruction in the program flow in the cycle
following the branch.
3.9.3 Addressing Modes
All load and store addressing modes implemented in the IXP42X product line and
IXC1100 control plane processors do not add to the instruction latencies numbers.
3.9.4 Instruction Latencies
The latencies for all the instructions are shown in the following sections with respect to
their functional groups: branch, data processing, multiply, status register access, load/
store, semaphore, and coprocessor.
The following section explains how to read these tables.
3.9.4.1 Performance Terms
• Issue Clock (cycle 0)
The first cycle when an instruction is decoded and allowed to proceed to further
stages in the execution pipeline (i.e., when the instruction is actually issued).
Table 76. Branch Latency Penalty
Core Clock Cycles
Description
ARM
*
Thumb*
+0 + 0
Predicted Correctly. The instruction is in the branch target cache and is
correctly predicted.
+4 + 5
Mispredicted. There are three occurrences of branch misprediction, all of
which incur a 4-cycle branch delay penalty.
1. The instruction is in the branch target buffer and is predicted not-
taken, but is actually taken.
2. The instruction is not in the branch target buffer and is a taken branch.
3. The instruction is in the branch target buffer and is predicted taken, but
is actually not-taken