15-44 Vol. 3
MACHINE-CHECK ARCHITECTURE
mechanism to indicate the frequency of exceptions. A multiprocessing oper-
ating system stores the identity of the processor node incurring the excep-
tion using a unique identifier, such as the processor’s APIC ID (see Section
10.9, “Handling Interrupts”).
The basic algorithm given in Example 15-3 can be modified to provide more
robust recovery techniques. For example, software has the flexibility to
attempt recovery using information unavailable to the hardware. Specifi
-
cally, the machine-check exception handler can, after logging carefully
analyze the error-reporting registers when the error-logging routine reports
an error that does not allow execution to be restarted. These recovery tech
-
niques can use external bus related model-specific information provided
with the error report to localize the source of the error within the system and
determine the appropriate recovery strategy.
15.10.4 Machine-Check Software Handler Guidelines for Error
Recovery
15.10.4.1 Machine-Check Exception Handler for Error Recovery
When writing a machine-check exception (MCE) handler to support software
recovery from Uncorrected Recoverable (UCR) errors, consider the
following:
• When IA32_MCG_CAP [24] is zero, there are no recoverable errors supported
and all machine-check are fatal exceptions. The logging of status and error
information is therefore a baseline implementation requirement.
• When IA32_MCG_CAP [24] is 1, certain uncorrected errors called uncorrected
recoverable (UCR) errors may be software recoverable. The handler can analyze
the reported error information, and in some cases attempt to recover from the
uncorrected error and continue execution.
• For processors with DisplayFamily_DisplayModel encoding of 06H_EH and above,
a MCA signal is broadcast to all logical processors in the system. Due to the
potentially shared machine check MSR resources among the logical processors
on the same package/core, the MCE handler may be required to synchronize with
the other processors that received a machine check error and serialize access to
the machine check registers when analyzing, logging and clearing the
information in the machine check registers.
• The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the
error information in the register is valid. If this flag is clear, the registers in that
bank do not contain valid error information and should not be checked.
• The MCE handler is primarily responsible for processing uncorrected errors. The
UC flag in each IA32_MCi_Status register indicates whether the reported error