Sun Microsystems T1000 Server User Manual


 
Chapter 2 Sun Fire T1000 Server Diagnostics 35
Example:
In this example, MB/CMP0/CH2/R0/D0 (DIMM 0 at J0701) is disabled. Until the
faulty component is replaced, the system can boot using memory that was not
disabled.
Note You can use ASR commands to display and control disabled components.
See “Managing System Components with Automatic System Recovery Commands”
on page 40.
Using the Solaris Predictive Self-Healing
Feature
The Solaris OS predictive self-healing technology enables Sun Fire T1000 server to
diagnose problems while the Solaris OS is running, and mitigate many serious
problems before they occur.
The Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot time
and runs in the background to monitor the system. If a component generates an
error, the daemon handles the error by correlating the error with data from previous
errors and other related information to diagnose the problem. Once diagnosed, the
fault manager daemon assigns the problem a unique identifier (UUID) that
distinguishes the problem across any set of systems. When possible, the fault
manager daemon initiates steps to self-heal the failed component and take the
component offline. The daemon also logs the fault to the syslogd daemon and
provides a fault notification with a message ID (MSGID). You can use message ID to
get additional information about the problem from Sun’s knowledge article
database.
The predictive self-healing technology covers the following Sun Fire T1000 server
components:
UltraSPARC T1 multicore processor
Memory
I/O bus
ok .#
sc> showfaults -v
ID Time FRU Fault
1 APR 24 12:47:27 MB/CMP0/CH2/R0/D0 MB/CMP0/CH2/R0/D0 deemed
faulty and disabled