Vol. 3 11-27
MEMORY CACHE CONTROL
on the Intel NetBurst microarchitecture that support Intel Hyper-Threading Tech-
nology.
11.6 SELF-MODIFYING CODE
A write to a memory location in a code segment that is currently cached in the
processor causes the associated cache line (or lines) to be invalidated. This check is
based on the physical address of the instruction. In addition, the P6 family and
Pentium processors check whether a write to a code segment may modify an instruc
-
tion that has been prefetched for execution. If the write affects a prefetched instruc-
tion, the prefetch queue is invalidated. This latter check is based on the linear
address of the instruction. For the Pentium 4 and Intel Xeon processors, a write or a
snoop of an instruction in a code segment, where the target instruction is already
decoded and resident in the trace cache, invalidates the entire trace cache. The latter
behavior means that programs that self-modify code can cause severe degradation
of performance when run on the Pentium 4 and Intel Xeon processors.
In practice, the check on linear addresses should not create compatibility problems
among IA-32 processors. Applications that include self-modifying code use the same
linear address for modifying and fetching the instruction. Systems software, such as
a debugger, that might possibly modify an instruction using a different linear address
than that used to fetch the instruction, will execute a serializing operation, such as a
CPUID instruction, before the modified instruction is executed, which will automati
-
cally resynchronize the instruction cache and prefetch queue. (See Section 8.1.3,
“Handling Self- and Cross-Modifying Code,” for more information about the use of
self-modifying code.)
For Intel486 processors, a write to an instruction in the cache will modify it in both
the cache and memory, but if the instruction was prefetched before the write, the old
version of the instruction could be the one executed. To prevent the old instruction
from being executed, flush the instruction prefetch unit by coding a jump instruction
immediately after any write that modifies an instruction.
11.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS)
Implicit caching occurs when a memory element is made potentially cacheable,
although the element may never have been accessed in the normal von Neumann
sequence. Implicit caching occurs on the P6 and more recent processor families due
to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching
is an extension of the behavior of existing Intel386, Intel486, and Pentium processor
systems, since software running on these processor families also has not been able
to deterministically predict the behavior of instruction prefetch.