IA-32 Intel® Architecture Optimization
6-2
• Memory Optimization Using Hardware Prefetching, Software
Prefetch and Cacheability Instructions: discusses techniques for
implementing memory optimizations using the above instructions.
• Using deterministic cache parameters to manage cache hierarchy.
General Prefetch Coding Guidelines
The following guidelines will help you to reduce memory traffic and
utilize peak memory system bandwidth more effectively when large
amounts of data movement must originate from the memory system:
• Take advantage of the hardware prefetcher’s ability to prefetch data
that are accessed in linear patterns, either forward or backward
direction.
• Take advantage of the hardware prefetcher’s ability to prefetch data
that are accessed in a regular pattern with access stride that are
substantially smaller than half of the trigger distance of the
hardware prefetch (see Table 1-2).
• Use a current-generation compiler, such as the Intel
®
C++ Compiler
that supports C++ language-level features for Streaming SIMD
Extensions. Streaming SIMD Extensions and MMX technology
instructions provide intrinsics that allow you to optimize cache
utilization. The examples of such Intel
®
compiler intrinsics are
_mm_prefetch, _mm_stream and _mm_load, _mm_sfence. For more
details on these intrinsics, refer to the Intel® C++ Compiler User’s
Guide, doc. number 718195.
NOTE. In a number of cases presented in this chapter,
the prefetching and cache utilization are specific to the
current implementation of Intel NetBurst
microarchitecture but are largely applicable for the
future processors.