Chapter 2 Product Overview and Troubleshooting 2-27
You can find more detailed descriptions of Solaris OS Predictive Self-Healing at the
website below:
http://www.sun.com/bigadmin/features/articles/selfheal.html
Predictive self-healing is an architecture and methodology for automatically
diagnosing, reporting, and handling software and hardware fault conditions. This
new technology lessens the time required to debug a hardware or software problem
and provides the administrator and technical support with detailed data about each
fault.
2.6.1 Predictive Self-Healing Tools
In Solaris OS, the fault manager runs in the background. If a failure occurs, the
system software recognizes the error and attempts to determine what hardware is
faulty. The software also takes steps to prevent that component from being used
until it has been replaced. Some of the specific activities the software takes include:
■ Receives telemetry information about problems detected by the system software
■ Diagnoses the problems
■ Initiates pro-active self-healing activities. For example, the fault manager can
disable faulty components.
The state of a FRU, group of FRUs, or part of a FRU, that has been isolated
because a fault was detected. The isolation is usually done to prevent possibly
faulty components from affecting other system components. The part that is
isolated is not always the faulty part alone; a normal part may be degraded to
isolate the faulty part. If a function required for the operation of the system is
degraded, a system failure may result.
■ When possible, causes the faulty FRU to provide an LED indication of a fault in
addition to populating the system console messages with more details
TABLE 2-8 shows a typical message generated when a fault occurs. The message
appears on your console and is recorded in the /var/adm/messages file.
Note – The message in TABLE 2-8 indicates that the fault has already been diagnosed.
Any corrective action that the system can perform has already taken place. If your
server is still running, it continues to run.