15-18 Vol. 3
MACHINE-CHECK ARCHITECTURE
processor; the handler must be written to interpret P5_MC_TYPE encodings
correctly.
15.4 ENHANCED CACHE ERROR REPORTING
Starting with Intel Core Duo processors, cache error reporting was
enhanced. In earlier Intel processors, cache status was based on the
number of correction events that occurred in a cache. In the new paradigm,
called “threshold-based error status”, cache status is based on the number
of lines (ECC blocks) in a cache that incur repeated corrections. The
threshold is chosen by Intel, based on various factors. If a processor
supports threshold-based error status, it sets IA32_MCG_CAP[11]
(MCG_TES_P) to 1; if not, to 0.
A processor that supports enhanced cache error reporting contains hard-
ware that tracks the operating status of certain caches and provides an indi-
cator of their “health”. The hardware reports a “green” status when the
number of lines that incur repeated corrections is at or below a pre-defined
threshold, and a “yellow” status when the number of affected lines exceeds
the threshold. Yellow status means that the cache reporting the event is
operating correctly, but you should schedule the system for servicing within
a few weeks.
Intel recommends that you rely on this mechanism for structures supported
by threshold-base error reporting.
The CPU/system/platform response to a yellow event should be less severe
than its response to an uncorrected error. An uncorrected error means that
a serious error has actually occurred, whereas the yellow condition is a
warning that the number of affected lines has exceeded the threshold but is
not, in itself, a serious event: the error was corrected and system state was
not compromised.
The green/yellow status indicator is not a foolproof early warning for an
uncorrected error resulting from the failure of two bits in the same ECC
block. Such a failure can occur and cause an uncorrected error before the
yellow threshold is reached. However, the chance of an uncorrected error
increases as the number of affected lines increases.
15.5 CORRECTED MACHINE CHECK ERROR INTERRUPT
Corrected machine-check error interrupt (CMCI) is an architectural
enhancement to the machine-check architecture. It provides capabilities