IBM 750 Server User Manual


 
IBM United States Hardware Announcement
110-009
IBM is a registered trademark of International Business Machines Corporation
19
program. For the vast majority of faults, a good FFDC design means that the root
cause can also be detected automatically without servicer intervention.
First Failure Data Capture FFDC information, error data analysis, and fault isolation
are necessary to implement the advanced serviceability techniques that enable
efficient service of the systems and to help determine the failing items.
In the rare absence of FFDC and Error Data Analysis, diagnostics are required to re-
create the failure and determine the failing items.
Diagnostics
General diagnostic objectives are to detect and identify problems such that they can
be resolved quickly. Elements of IBM's diagnostics strategy include:
Provide a common error code format equivalent to a system reference code,
system reference number, checkpoint, or firmware error code.
Provide fault detection and problem isolation procedures. Support remote
connection ability to be used by the IBM Remote Support Center or IBM
Designated Service.
Provide interactive intelligence within the diagnostics with detailed online failure
information while connected to IBM's back-end system.
Automatic diagnostics
Because of the FFDC technology designed into IBM Servers, it is not necessary to
perform re-create diagnostics for failures or require user intervention. Solid and
intermittent errors are designed to be correctly detected and isolated at the time the
failure occurs. Runtime and boot-time diagnostics fall into this category.
Stand-alone diagnostics
As the name implies, stand-alone or user-initiated diagnostics require user
intervention. The user must perform manual steps, including:
Compact disk-based diagnostics
Keying in commands
Interactively selecting steps from a list of choices
Concurrent maintenance
The system will continue to support concurrent maintenance of power, cooling, PCI
adapters, DASD, DVD, and firmware updates (when possible). The determination of
whether a firmware release can be updated concurrently is identified in the readme
information file released with the firmware.
Service labels
Service providers use these labels to assist them in performing maintenance actions.
Service labels are found in various formats and positions, and are intended to
transmit readily available information to the servicer during the repair process.
Following are some of these service labels and their purpose:
Location diagrams
Location diagrams are strategically located on the system hardware, relating
information regarding the placement of hardware components. Location diagrams
may include location codes, drawings of physical locations, concurrent maintenance
status, or other data pertinent to a repair. Location diagrams are especially useful
when multiple components are installed such as DIMMs, CPUs, processor books,
fans, adapter cards, LEDs, and power supplies.