IA-32 Intel® Architecture Optimization
6-18
Memory Optimization Using Prefetch
The Pentium 4 processor has two mechanisms for data prefetch:
software-controlled prefetch and an automatic hardware prefetch.
Software-controlled Prefetch
The software-controlled prefetch is enabled using the four prefetch
instructions introduced with Streaming SIMD Extensions instructions.
These instructions are hints to bring a cache line of data in to various
levels and modes in the cache hierarchy. The software-controlled
prefetch is not intended for prefetching code. Using it can incur
significant penalties on a multiprocessor system when code is shared.
Software prefetching has the following characteristics:
• Can handle irregular access patterns, which do not trigger the
hardware prefetcher.
• Can use less bus bandwidth than hardware prefetching; see below.
• Software prefetches must be added to new code, and do not benefit
existing applications.
Example 6-1 Pseudo-code for Using cflush
while (!buffer_ready} {}
mfence
for(i=0;i<num_cachelines;i+=cacheline_size) {
clflush (char *)((unsigned int)buffer + i)
}
mfence
prefnta buffer[0];
VAR = buffer[0];