Fujitsu M9000 Server User Manual


 
Chapter 2 System Overview and Troubleshooting 2-27
Predictive self-healing is an architecture and methodology for automatically diagnosing,
reporting, and handling software and hardware fault conditions. This new technology lessens
the time required to debug a hardware or software problem and provides the administrator
and technical support with detailed data about each fault.
2.6.1 Predictive Self-Healing Tools
In Oracle Solaris OS, the fault manager runs in the background. If a failure occurs, the
system software recognizes the error and attempts to determine what hardware is faulty. The
software also takes steps to prevent that component from being used until it has been
replaced. Some of the specific activities the software takes include:
Receives telemetry information about problems detected by the system software
Diagnoses the problems
Initiates pro-active self-healing activities. For example, the fault manager can disable
faulty components.
The state of a FRU, group of FRUs, or part of a FRU, that has been isolated because a
fault was detected. The isolation is usually done to prevent possibly faulty components
from affecting other system components. The part that is isolated is not always the faulty
part alone; a normal part may be degraded to isolate the faulty part. If a function required
for the operation of the system is degraded, a system failure may result.
When possible, causes the faulty FRU to provide an LED indication of a fault in addition
to populating the system console messages with more details
TABLE 2-8 shows a typical message generated when a fault occurs. The message appears on
your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-8 indicates that the fault has already been diagnosed. Any
corrective action that the system can perform has already taken place. If your server is still
running, it continues to run.