Hardware Reference Manual 111
Intel
®
IXP2800 Network Processor
Intel XScale
®
Core
3.8.1.6 Instruction TLB Efficiency Mode
PMN0 totals the number of instructions that were executed, which does not include instructions
that were translated by the instruction TLB and never executed. This can happen if a branch
instruction changes the program flow; the instruction TLB may translate the next sequential
instructions after the branch, before it receives the target address of the branch.
PMN1 counts the number of instruction TLB table-walks that occurs when there is a TLB miss.
If the instruction TLB is disabled, PMN1 will not increment.
Statistics derived from these two events:
• Instruction TLB miss-rate. This is derived by dividing PMN1 by PMN0.
• The average number of cycles it took to execute an instruction or commonly referred to as
cycles-per-instruction (CPI).
CPI can be derived by dividing CCNT by PMN0, where CCNT was used to measure total
execution time.
3.8.1.7 Data TLB Efficiency Mode
PMN0 totals the number of data cache accesses, which includes cacheable and non-cacheable
accesses, mini-data cache access and accesses made to locations configured as data RAM.
Note that STM and LDM will each count as several accesses to the data TLB depending on the
number of registers specified in the register list. LDRD will register two accesses.
PMN1 counts the number of data TLB table-walks, which occurs when there is a TLB miss. If the
data TLB is disabled PMN1 will not increment.
The statistic derived from these two events is:
• Data TLB miss-rate. This is derived by dividing PMN1 by PMN0.
3.8.2 Multiple Performance Monitoring Run Statistics
Even though only two events can be monitored at any given time, multiple performance monitoring
runs can be done, capturing different events from different modes. For example, the first run could
monitor the number of writeback operations (PMN1 of mode, Stall/Writeback) and the second run
could monitor the total number of data cache accesses (PMN0 of mode, Data Cache Efficiency).
From the results, a percentage of writeback operations to the total number of data accesses can be
derived.
3.9 Performance Considerations
This section describes relevant performance considerations that compiler writers, application
programmers, and system designers need to be aware of to efficiently use the Intel XScale
®
core.
Performance numbers discussed here include interrupt latency, branch prediction, and instruction
latencies.