Support User Manuals

Intel IXC1100 Personal Computer User Manual

Open as PDF

of 568

Intel

®

IXP42X product line and IXC1100 control plane processors—Intel XScale

®

Processor

Intel

®

IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

DM September 2006

194 Order Number: 252480-006US

sequentially should not exceed four. Also note that a preload instruction may cause a fill

buffer to be used. As a result, the number of preload instructions outstanding should

also be considered to arrive at the number of loads that are outstanding.

Similarly, the number of write buffers also limits the number of successive writes that

can be issued before the processor stalls. No more than eight stores can be issued. Also

note that if the data caches are using the write-allocate with write-back policy, then a

load operation may cause stores to the external memory if the read operation evicts a

cache line that is dirty (modified). The number of sequential stores may be limited by

this fact.

3.10.5.1.1 Scheduling Load and Store Double (LDRD/STRD)

The IXP42X product line and IXC1100 control plane processors introduce two new

double word instructions: LDRD and STRD. LDRD loads 64 bits of data from an

effective address into two consecutive registers, conversely, STRD stores 64 bits from

two consecutive registers to an effective address. There are two important restrictions

on how these instructions may be used:

• The effective address must be aligned on an 8-byte boundary

• The specified register must be even (r0, r2, etc.).

If this situation occurs, using LDRD/STRD instead of LDM/STM to do the same thing is

more efficient because LDRD/STRD issues in only one/two clock cycle(s), as opposed

to LDM/STM which issues in four clock cycles. Avoid LDRDs targeting R12; this incurs

an extra cycle of issue latency.

The LDRD instruction has a result latency of 3 or 4 cycles depending on the destination

register being accessed (assuming the data being loaded is in the data cache).

In the code example above, the ORR instruction would stall for three cycles because of

the four cycle result latency for the second destination register of an LDRD instruction.

The code shown above can be rearranged to remove the pipeline stalls:

Any memory operation following a LDRD instruction (LDR, LDRD, STR and so on)

would stall for 1 cycle.

add r6, r7, r8

sub r5, r6, r9

; The following ldrd instruction would load values

; into registers r0 and r1

ldrd r0, [r3]

orr r8, r1, #0xf

mul r7, r0, r7

; The following ldrd instruction would load values

; into registers r0 and r1

ldrd r0, [r3]

add r6, r7, r8

sub r5, r6, r9

mul r7, r0, r7

orr r8, r1, #0xf

; The str instruction below would stall for 1 cycle

ldrd r0, [r3]

str r4, [r5]

previous next