Intel
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 193
Intel XScale
®
Processor—Intel
®
IXP42X product line and IXC1100 control plane processors
In the code sample above, the ADD and the LDR instruction can be moved before the
MOV instruction. Note that this would prevent pipeline stalls if the load hits the data
cache. However, if the load is likely to miss the data cache, move the LDR instruction
so that it executes as early as possible - before the SUB instruction. However, moving
the LDR instruction before the SUB instruction would change the program semantics.
It is possible to move the ADD and the LDR instructions before the SUB instruction if
we allow the contents of the register r6 to be spilled and restored from the stack as
shown below:
As can be seen above, the contents of the register r6 have been spilled to the stack and
subsequently loaded back to the register r6 to retain the program semantics. Another
way to optimize the code above is with the use of the preload instruction as shown
below:
The IXP42X product line and IXC1100 control plane processors have four fill-buffers
that are used to fetch data from external memory when a data-cache miss occurs. The
IXP42X product line and IXC1100 control plane processors stall when all fill buffers are
in use. This happens when more than 4 loads are outstanding and are being fetched
from memory. As a result, the code written should ensure that no more than four loads
are outstanding at the same time. For example, the number of loads issued
; all other registers are in use
sub r1, r6, r7
mul r3,r6, r2
mov r2, r2, LSL #2
orr r9, r9, #0xf
add r0,r4, r5
ldr r6, [r0]
add r8, r6, r8
add r8, r8, #4
orr r8,r8, #0xf
; The value in register r6 is not used after this
; all other registers are in use
str r6,[sp, #-4]!
add r0,r4,r5
ldr r6, [r0]
mov r2, r2, LSL #2
orr r9, r9, #0xf
add r8, r6, r8
ldr r6, [sp], #4
add r8, r8, #4
orr r8,r8, #0xf
sub r1, r6, r7
mul r3,r6, r2
; The value in register r6 is not used after this
; all other registers are in use
add r0,r4, r5
pld [r0]
sub r1, r6, r7
mul r3,r6, r2
mov r2, r2, LSL #2
orr r9, r9, #0xf
ldr r6, [r0]
add r8, r6, r8
add r8, r8, #4
orr r8,r8, #0xf
; The value in register r6 is not used after this