6-1
6
Optimizing Cache Usage
Over the past decade, processor speed has increased more than ten
times. Memory access speed has increased at a slower pace. The
resulting disparity has made it important to tune applications in one of
two ways: either (a) a majority of the data accesses are fulfilled from
processor caches, or (b) effectively masking memory latency to utilize
peak memory bandwidth as much as possible.
Hardware prefetching mechanisms are enhancements in
microarchitecture to facilitate the latter aspect, and will be most
effective when combined with software tuning. The performance of
most applications can be considerably improved if the data required can
be fetched from the processor caches or if memory traffic can take
advantage of hardware prefetching effectively.
Standard techniques to bring data into the processor before it is needed
involves additional programming which can be difficult to implement
and may require special steps to prevent performance degradation.
Streaming SIMD Extensions addressed this issue by providing the
various prefetch instructions.
Streaming SIMD Extensions also introduced the various non-temporal
store instructions. SSE2 extend this support to new data types and also
introduce non-temporal store support for the 32-bit integer registers.
This chapter focuses on three subjects:
• Hardware Prefetching Mechanism, Software Prefetch and
Cacheability Instructions: discusses microarchitectural feature and
instructions that allow you to affect data caching in an application.