Optimizing Cache Usage 6
6-45
The following examples of using prefetching instructions in the
operation of video encoder and decoder as well as in simple 8-byte
memory copy, illustrate performance gain from using the prefetching
instructions for efficient cache management.
Video Encoder
In a video encoder example, some of the data used during the encoding
process is kept in the processor’s second-level cache, to minimize the
number of reference streams that must be re-read from system memory.
To ensure that other writes do not disturb the data in the second-level
cache, streaming stores (
movntq) are used to write around all processor
caches.
The prefetching cache management implemented for the video encoder
reduces the memory traffic. The second-level cache pollution reduction
is ensured by preventing single-use video frame data from entering the
second-level cache. Using a non-temporal prefetch (
prefetchnta)
instruction brings data into only one way of the second-level cache, thus
reducing pollution of the second-level cache. If the data brought directly
to second-level cache is not re-used, then there is a performance gain
from the non-temporal prefetch over a temporal prefetch. The encoder
uses non-temporal prefetches to avoid pollution of the second-level
cache, increasing the number of second-level cache hits and decreasing
the number of polluting write-backs to memory. The performance gain
results from the more efficient use of the second-level cache, not only
from the prefetch itself.
Video Decoder
In the video decoder example, completed frame data is written to local
memory of the graphics card, which is mapped to
WC (Write-combining)
memory type. A copy of reference data is stored to the
WB memory at a
later time by the processor in order to generate future data. The
assumption is that the size of the reference data is too large to fit in the
processor’s caches. A streaming store is used to write the data around
the cache, to avoid displaying other temporal data held in the caches.