Intel Processor Computer Hardware User Manual


 
B-20 March, 2003 Developers Manual
Intel
®
80200 Processor based on Intel
®
XScale
Microarchitecture
Optimization Guide
B.4.2.3. Read Allocate and Read-write Allocate Memory Regions
Most of the regular data and the stack for your application should be allocated to a read-write
allocate region. It is expected that you write and read from them often.
Data that is write only (or data that is written to and subsequently not used for a long time) should
be placed in a read allocate region. Under the read-allocate policy if a cache write miss occurs a
new cache line is not allocated, and hence does not evict critical data from the Data cache.
B.4.2.4. Creating On-chip RAM
Part of the Data cache can be converted into fast on chip RAM. Access to objects in the on-chip
RAM does not incur cache miss penalties, thereby reducing the number of processor stalls.
Application performance can be improved by converting a part of the cache into on chip RAM and
allocating frequently allocated variables to it. Due to the Intel
®
80200 processor round robin
replacement policy, all data is eventually evicted. Therefore to prevent critical or frequently used
data from being evicted it should be allocated to on-chip RAM.
The following variables are good candidates for allocating to the on-chip RAM:
Frequently used global data used for storing context for context switching.
Global variables that are accessed in time critical functions such as interrupt service routines.
The on-chip RAM is created by locking a memory region into the Data cache (see Section 6.4,
“Re-configuring the Data Cache as Data RAM” for more details). If the data in the on-chip RAM is
to be initialized to zero, then the locking process can be speed up by using the CP15 prefetch zero
function. This function does not generate external memory references. See the Intel
®
80200
processor reference manual for more information on how to do this.
When creating the on-chip RAM, care must be taken to ensure that all sets in the on-chip RAM
area of the Data cache have approximately the same number of ways locked, otherwise some sets
have more ways locked than the others. This uneven allocation increases the level of thrashing in
some sets and leave other sets under utilized.
For example, consider three arrays arr1, arr2 and arr3 of size 64 bytes each that are being allocated
to the on-chip RAM and assume that the address of arr1 is 0, address of arr2 is 1024, and the
address of arr3 is 2048. All three arrays are within the same sets, i.e. set0 and set2, as a result three
ways in both sets set0 and set1, are locked, leaving 28 ways for use by other variables.
This can overcome by allocating on-chip RAM data in sequential order. In the above example
allocating arr2 to address 64 and arr3 to address 128, allows the three arrays to use only 1 way in
sets 0 through 8.