Intel IXP12xx Network Router User Manual


 
Version 1.0, 4/10/02
Page 7 of 17
bytes/minimum frame}. 84 bytes/frame * 8 bits/byte / 100Mb/sec = 6.72 usec/frame. 232MHz *
6.72 usec/frame = 1559 cycles/frame
These cycle budgets specify how frequently a cell or frame goes over the wire. If multiple
threads handle multiple frames on the same wire, then the budgets are multiplied accordingly.
For example, the OC-12 cycle budget is 164 cycles/cell, but since the four threads on a single
microengine can work on four frames simultaneously, the equivalent per-thread cycle budget
becomes 4 * 164 cycles, or approximately 660 cycles/frame. That is, four threads working on 4
different cells can each take up to 660 cycles to process a cell and still keep up with line rate.
This per-thread per-packet cycle budget is independent of how the thread consumes the cycles – it
specifies only the maximum time (in cycles) between the beginning and end of packet processing.
The cycles may be used for instruction execution, aborted instructions due to branches,
microengine stalls due to command queue pushback, or idle cycles. Changes in any of these uses
of time can cause a thread to meet or exceed its cycle budget.
The Developer’s Workbench IX Bus Device Simulator is typically configured to show
performance in Mbps based on frames/sec. However, it can also be configured to display
cycles/frame, which is useful in tuning a design to reach cycle budgets.
Developer’s Workbench IX Bus Simulator – Bounded and Unbounded Wire Rates
Simulations can be run with ports “bounded” or “unbounded” to the wire rate. Simulations run
with ports “bounded” to wire rate will always show exactly the correct cycle budget/frame –
because it is bound to the desired wire rate. It is also useful to run a simulation with the ports
“unbounded” to wire-rate (infinite bandwidth on the wire). This means that on the receive side
there is always data waiting on the wire, and on the transmit side the wire is always ready to
accept more data. If the design is able to run faster than wire-rate, then setting the IX Bus Device
Simulator to display in cycles/frame can be useful to relate that to instructions.
This technique was used to measure the OC-12 Receive Microengine over several workloads
against its 164 cycles/cell budget. The 8 interleaved VC workloads were used to make sure that
the VC-cache experienced a 100% miss rate. Figure 4 shows the results for both the –75 and –7E
DRAM speed grades.
Cells/PDU
Virtual Circuits Cycles/Cell –75 Cycles/cell –7E
1* 1 154.6 137.9
1* 4 random 163.9 149.8
1* 8 interleaved 172.8** 159.0
2 1 161.0 137.1
2 8 interleaved 158.4 149.2
32 1 152.5 141.9
32 8 interleaved 131.5 127.4
* Simulations show that ATM Receive can handle the 1-cell/PDU workload, but that the IP Router in the
next pipeline stage falls behind.
** For –75 DRAM, the ATM Receive cycle budget is exceeded for a workload of single-cell, interleaved
PDUs.
Figure 4 – OC-12 Unbounded ATM Receive simulations versus 164 cycle budget