10 Sun Fire T1000 Server Service Manual • January 2006
■ ALOM-CMT firmware – is the system firmware that runs on the system
controller. In addition to providing the interface between the hardware and OS,
ALOM also tracks and reports the health of key server components. ALOM works
closely with POST and Solaris predictive self healing technology to keep the
system up and running even when there is a faulty component.
■ Power-On self-test (POST) – Performs diagnostics on system components upon
system reset to ensure the integrity of those components. POST is configureable
and works with ALOM to take faulty components offline if needed and blacklist
them in the asr-db.
■ Solaris OS predictive self healing (PSH) – Continuously monitors the health of
the CPU and memory, and works with ALOM to take a faulty component offline
if needed.
■ Log files and console messages – Provide the standard Solaris OS log files and
investigative commands that can be accessed and displayed on the device of your
choice.
■ SunVTS™ – is an application you can run that exercises the system, provides
hardware validation, and discloses possible faulty components with
recommendations for repair.
The LEDs, ALOM, Solaris OS PSH, and many of the log files and console messages
are integrated. For example, a fault detected by the Solaris PSH software will display
the fault, log it, pass information to ALOM where it is logged, and depending on the
fault, might result in the illumination of one or more LEDs.
The diagnostic flowchart in
FIGURE 2-1 and TABLE 2-1 describe an approach for using
the servers diagnostics that is likely identify a faulty field-replaceable unit (FRU).
The diagnostics you use, and the order in which you use them, depend on the nature
of the problem you are troubleshooting, so you might not follow this flow step-by-
step.
The flowchart assumes that you have already performed some rudimentary
troubleshooting such as verification of proper installation, visual inspection of cables
and power, and possibly reset server (For details, refer to the Sun Fire T1000 Server
Installation Guide and Sun Fire T1000 Server Administration Guide.
Use this flow chart to understand what diagnostics are available to troubleshoot
faulty hardware, and use TABLE 2-1 to find more information about each diagnostic
in this chapter.
For many faults, service can be deferred, either because the faulty component has
been asr'd out, the fault is being corrected, or the fault is predictive