Optimizing Cache Usage 6
6-17
The clflush Instruction
The cache line associated with the linear address specified by the value
of byte address is invalidated from all levels of the processor cache
hierarchy (data and instruction). The invalidation is broadcast
throughout the coherence domain. If, at any level of the cache hierarchy,
the line is inconsistent with memory (dirty) it is written to memory
before invalidation. Other characteristics include:
• The data size affected is the cache coherency size, which is 64 bytes
on Pentium 4 processor.
• The memory attribute of the page containing the affected line has no
effect on the behavior of this instruction.
• The clflush instruction can be used at all privilege levels and is
subject to all permission checking and faults associated with a byte
load.
clflush is an unordered operation with respect to other memory traffic
including other
clflush instructions. Software should use a mfence,
memory fence for cases where ordering is a concern.
As an example, consider a video usage model, wherein a video capture
device is using non-coherent AGP accesses to write a capture stream
directly to system memory. Since these non-coherent writes are not
broadcast on the processor bus, they will not flush any copies of the
same locations that reside in the processor caches. As a result, before the
processor re-reads the capture buffer, it should use
clflush to ensure
that any stale copies of the capture buffer are flushed from the processor
caches. Due to speculative reads that may be generated by the processor,
it is important to observe appropriate fencing, using
mfence.
Example 6-1 illustrates the pseudo-code for the recommended usage of
cflush.