Intel Processor Computer Hardware User Manual


 
Developers Manual March, 2003 B-25
Intel
®
80200 Processor based on Intel
®
XScale
Microarchitecture
Optimization Guide
B.4.4 Prefetch Considerations
The Intel
®
80200 processor has a true prefetch load instruction (PLD). The purpose of this
instruction is to preload data into the data and mini-data caches. Data prefetching allows hiding of
memory transfer latency while the processor continues to execute instructions. The prefetch is
important to compiler and assembly code because judicious use of the prefetch instruction can
enormously improve throughput performance of the Intel
®
80200 processor. Data prefetch can be
applied not only to loops but also to any data references within a block of code. Prefetch also
applies to data writing when the memory type is enabled as write allocate
The Intel
®
80200 processor prefetch load instruction is a true prefetch instruction because the load
destination is the data or mini-data cache and not a register. Compilers for processors which have
data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache.
This technique has the disadvantages of using a register to load data and requiring additional
registers for subsequent preloads and thus increasing register pressure. By contrast, the Intel
®
80200 processor prefetch can be used to reduce register pressure instead of increasing it.
The Intel
®
80200 processor prefetch load is a hint instruction and does not guarantee that the data
is loaded. Whenever the load would cause a fault or a table walk, then the processor ignores the
prefetch instruction, the fault or table walk, and continue processing the next instruction. This is
particularly advantageous in the case where a linked list or recursive data structure is terminated by
a NULL pointer. Prefetching the NULL pointer does not fault program flow.
B.4.4.1. Prefetch Distances in the Intel
®
80200 Processor
Scheduling the prefetch instruction requires understanding the system latency times and system
resources which affect when to use the prefetch instruction. This section considers three timing
elements:
N
cwf
critical word first
N
clxfer
full cache line transfer time
N
subissue
subsequent prefetch issue time to insure uninterrupted transfers
The memory latency times presented here assume typical SDRAM that is currently available and
working with the Intel
®
80200 processor. It is assumed that the SDRAM supports “Critical Word
First” transfers. That is when a cache line is being transferred, the first word transferred
corresponds to the one needed by the processor immediately as opposed to transferring the data
from lowest address first.
The cycle times assume that the core is running six times a fast as the memory transfer bus. Further,
the example values presented here apply to the current processor implantation, (TBD processor
name) and are different for future implementations.
N
cwf
is the number of core cycles required to transfer the first critical word of a prefetch or load
operation:
Where:
N
lookup
This is the number of core clocks required for the processor to issue a memory transfer
request to the SDRAM plus the time the SDRAM requires to locate the data.
N
cwf
N
lookup
N
cwfxfer
+=
N
lookup
N
processor
N
memwait
N
mempagewait
++=