Intel
®
IXP42X product line and IXC1100 control plane processors—Intel XScale
®
Processor
Intel
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
194 Order Number: 252480-006US
sequentially should not exceed four. Also note that a preload instruction may cause a fill
buffer to be used. As a result, the number of preload instructions outstanding should
also be considered to arrive at the number of loads that are outstanding.
Similarly, the number of write buffers also limits the number of successive writes that
can be issued before the processor stalls. No more than eight stores can be issued. Also
note that if the data caches are using the write-allocate with write-back policy, then a
load operation may cause stores to the external memory if the read operation evicts a
cache line that is dirty (modified). The number of sequential stores may be limited by
this fact.
3.10.5.1.1 Scheduling Load and Store Double (LDRD/STRD)
The IXP42X product line and IXC1100 control plane processors introduce two new
double word instructions: LDRD and STRD. LDRD loads 64 bits of data from an
effective address into two consecutive registers, conversely, STRD stores 64 bits from
two consecutive registers to an effective address. There are two important restrictions
on how these instructions may be used:
• The effective address must be aligned on an 8-byte boundary
• The specified register must be even (r0, r2, etc.).
If this situation occurs, using LDRD/STRD instead of LDM/STM to do the same thing is
more efficient because LDRD/STRD issues in only one/two clock cycle(s), as opposed
to LDM/STM which issues in four clock cycles. Avoid LDRDs targeting R12; this incurs
an extra cycle of issue latency.
The LDRD instruction has a result latency of 3 or 4 cycles depending on the destination
register being accessed (assuming the data being loaded is in the data cache).
In the code example above, the ORR instruction would stall for three cycles because of
the four cycle result latency for the second destination register of an LDRD instruction.
The code shown above can be rearranged to remove the pipeline stalls:
Any memory operation following a LDRD instruction (LDR, LDRD, STR and so on)
would stall for 1 cycle.
add r6, r7, r8
sub r5, r6, r9
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
orr r8, r1, #0xf
mul r7, r0, r7
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
add r6, r7, r8
sub r5, r6, r9
mul r7, r0, r7
orr r8, r1, #0xf
; The str instruction below would stall for 1 cycle
ldrd r0, [r3]
str r4, [r5]