4-16 Netra CP3260 Blade Server User’s Guide • April 2009
■ Any CPU failed
■ All logical memory banks failed
■ Flash RAM cyclical redundancy check (CRC) failure
■ Critical field-replaceable unit (FRU) PROM configuration data failure
■ Critical application-specific integrated circuit (ASIC) failure
4.5 Automatic System Recovery
Automatic system recovery (ASR) consists of self-test features and an
autoconfiguration capability to detect failed hardware components and unconfigure
them. By enabling ASR, the server is able to resume operating after certain nonfatal
hardware faults or failures have occurred.
If a component is monitored by ASR and the server is capable of operating without
it, the server automatically reboots if that component develops a fault or fails. This
capability prevents a faulty hardware component from stopping operation of the
entire system or causing the system to fail repeatedly.
If a fault is detected during the power-on sequence, the faulty component is
disabled. If the system remains capable of functioning, the boot sequence continues.
To support this degraded boot capability, the OpenBoot firmware uses the 1275
client interface (by means of the device tree) to mark a device as either failed or
disabled, creating an appropriate status property in the device tree node. The Solaris
OS does not activate a driver for any subsystem marked in this way.
As long as a failed component is electrically dormant (not causing random bus
errors or signal noise, for example), the system reboots automatically and resumes
operation while a service call is made.
Once a failed or disabled device is replaced with a new one, the OpenBoot firmware
automatically modifies the status of the device upon reboot.
Note – ASR is not enabled until you activate it (see Section 4.5.1.1, “To Enable
Automatic System Recovery” on page 4-17).