Vol. 3 11-7
MEMORY CACHE CONTROL
Processors based on Intel Core microarchitectures implement one level of instruction
TLB and two levels of data TLB. Intel Core i7 processor provides a second-level
unified TLB.
The store buffer is associated with the processors instruction execution units. It
allows writes to system memory and/or the internal caches to be saved and in some
cases combined to optimize the processor’s bus accesses. The store buffer is always
enabled in all execution modes.
The processor’s caches are for the most part transparent to software. When enabled,
instructions and data flow through these caches without the need for explicit soft-
ware control. However, knowledge of the behavior of these caches may be useful in
optimizing software performance. For example, knowledge of cache dimensions and
replacement algorithms gives an indication of how large of a data structure can be
operated on at once without causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circum-
stances, require intervention by system software. For these rare cases, the processor
provides privileged cache control instructions for use in flushing caches and forcing
memory ordering.
The Pentium III, Pentium 4, and Intel Xeon processors introduced several instructions
that software can use to improve the performance of the L1, L2, and L3 caches,
including the PREFETCHh and CLFLUSH instructions and the non-temporal move
instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD). The use of
these instructions are discussed in
Section 11.5.5, “Cache Management Instruc-
tions.”
11.2 CACHING TERMINOLOGY
IA-32 processors (beginning with the Pentium processor) and Intel 64 processors use
the MESI (modified, exclusive, shared, invalid) cache protocol to maintain consis
-
tency with internal caches and caches in other processors (see Section 11.4, “Cache
Control Protocol”).
When the processor recognizes that an operand being read from memory is cache-
able, the processor reads an entire cache line into the appropriate cache (L1, L2, L3,
or all). This operation is called a cache line fill. If the memory location containing
that operand is still cached the next time the processor attempts to access the
operand, the processor can read the operand from the cache instead of going back to
memory. This operation is called a cache hit.
When the processor attempts to write an operand to a cacheable area of memory, it
first checks if a cache line for that memory location exists in the cache. If a valid
cache line does exist, the processor (depending on the write policy currently in force)
can write the operand into the cache instead of writing it out to system memory. This
operation is called a write hit. If a write misses the cache (that is, a valid cache line
is not present for area of memory being written to), the processor performs a cache
line fill, write allocation. Then it writes the operand into the cache line and