Optimizing Cache Usage 6
6-5
3. Follows only one stream per 4K page (load or store)
4. Can prefetch up to 8 simultaneous independent streams from eight
different 4K regions
5. Does not prefetch across 4K boundary; note that this is independent
of paging modes
6. Fetches data into second/third-level cache
7. Does not prefetch UC or WC memory types
8. Follows load and store streams. Issues Read For Ownership (RFO)
transactions for store streams and Data Reads for load streams.
Other than the items 2 and 4 discussed above, most other characteristics
also apply to Pentium M, Intel Core Solo and Intel Core Duo processors.
The hardware prefetcher implemented in the Pentium M processor
fetches data to a second level cache. It can track 12 independent streams
in the forward direction and 4 independent streams in the backward
direction. The hardware prefetcher of Intel Core Solo processor can
track 16 forward streams and 4 backward streams. On the Intel Core
Duo processor, the hardware prefetcher in each core fetches data
independently.
Prefetch and Cacheability Instructions
The prefetch instruction, inserted by the programmers or compilers,
accesses a minimum of two cache line of data on the Pentium 4
processor (one cache line of data on the Pentium M processor) prior to
that data actually being needed. This hides the latency for data access in
the time required to process data already resident in the cache. Many
algorithms can provide information in advance about the data that is to
be required soon. In cases where the memory accesses are in long,
regular data patterns, the automatic hardware prefetcher should be
favored over software prefetches.
The cacheability control instructions allow you to control data caching
strategy in order to increase cache efficiency and minimize cache
pollution.