Fujitsu T2000 Server User Manual


 
3-2 SPARC Enterprise T2000 Server Service Manual April 2007
ALOM CMT firmware –This system firmware runs on the system controller. In
addition to providing the interface between the hardware and OS, ALOM CMT
also tracks and reports the health of key server components. ALOM CMT works
closely with POST and Solaris Predictive Self-Healing technology to keep the
system up and running even when there is a faulty component.
Power-on self-test (POST) POST performs diagnostics on system components
upon system reset to ensure the integrity of those components. POST is
configureable and works with ALOM CMT to take faulty components offline if
needed.
Solaris OS Predictive Self-Healing (PSH) This technology continuously
monitors the health of the CPU and memory, and works with ALOM CMT to take
a faulty component offline if needed. The Predictive Self-Healing technology
enables systems to accurately predict component failures and mitigate many
serious problems before they occur.
Log files and console messages Provide the standard Solaris OS log files and
investigative commands that can be accessed and displayed on the device of your
choice.
SunVTS™ An application that exercises the system, provides hardware
validation, and discloses possible faulty components with recommendations for
repair.
The LEDs, ALOM CMT, Solaris OS PSH, and many of the log files and console
messages are integrated. For example, a fault detected by the Solaris software
displays the fault, logs it, passes information to ALOM CMT where it is logged, and
depending on the fault, might light one or more LEDs.
The flow chart in
FIGURE 3-1 and TABLE 3-1 describes an approach for using the server
diagnostics to identify a faulty field-replaceable unit (FRU). The diagnostics you use,
and the order in which you use them, depend on the nature of the problem you are
troubleshooting, so you might perform some actions and not others.
The flow chart assumes that you have already performed some troubleshooting such
as verification of proper installation, and visual inspection of cables and power, and
possibly performed a reset of the server (refer to the SPARC Enterprise T2000 Server
Installation Guide and SPARC Enterprise T2000 Server Administration Guide for details).
FIGURE 3-1 is a flow chart of the diagnostics available to troubleshoot faulty
hardware.
TABLE 3-1 has more information about each diagnostic in this chapter.
Note POST is configured with ALOM CMT configuration variables (TABLE 3-9). If
diag_level is set to max (diag_level=max), POST reports all detected FRUs
including memory devices with errors correctable by Predictive Self-Healing (PSH).
Thus, not all memory devices detected by POST need to be replaced. See
Section 3.4.5, “Correctable Errors Detected by POST” on page 3-36.