IBM H80 Series Personal Computer User Manual


 
Configuring and Deconfiguring Processors or Memory
All failures that crash the system with a machine check or check stop, even if
intermittent, are reported as a diagnostic callout for service repair. To prevent the
recurrence of intermittent problems and improve the availability of the system until a
scheduled maintenance window, processors and memory modules with a failure
history are marked "bad" to prevent their being configured on subsequent boots.
A processor or memory module is marked "bad" under the following circumstances:
A processor or memory module fails built-in self test (BIST) or power-on self test
(POST) testing during boot (as determined by the Service Processor).
A processor or memory module causes a machine check or check stop during
runtime, and the failure can be isolated specifically to that processor or memory
module (as determined by the processor runtime diagnostics in the Service
Processor).
A processor or memory module reaches a threshold of recovered failures that
results in a predictive callout (as determined by the processor runtime
diagnostics in the Service Processor).
During boot time, the Service Processor does not configure processors or memory
modules that are marked "bad," much in the same way that it would deconfigure
them for BIST/POST failures.
If a processor is deconfigured, the processor remains offline for subsequent reboots
until the faulty processor is replaced. The Repeat Gard function also provides the
users with the option of manually deconfiguring a processor, or re-enabling a
previously deconfigured processor. For information on how to configure or
deconfigure a processor, see the Processor Configuration/Deconfiguration Menu on
page 46.
You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the
Processor Configuration/Deconfiguration Menu, which is a submenu under the
System Information Menu.
Run-Time CPU Deconfiguration (CPU Gard)
L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2
cache correctable errors are monitored by the processor runtime diagnostics (PRD)
code running in the Service Processor. When a predefined error threshold is met, an
error log with warning severity and threshold exceeded status is returned to AIX. At
the same time, PRD marks the CPU for deconfiguration at the next boot. AIX will
70 RS/6000 Enterprise Server Model H80 Series User's Guide