56 RS/6000 43P 7043 Models 150 and 260 Handbook
3.1.2 Reliability, Availability, and Serviceability (RAS)
Following are the features that provide the IBM RS/6000 Model 150 reliability,
availability, and serviceability.
3.1.2.1 Reliability, Fault Tolerance, and Data Integrity
The reliability of the Model 150 system starts with reliable components,
devices, and subsystems. During the design and the development process,
subsystems go through rigorous verification and integration testing
processes. During system manufacturing, systems go through a testing
process to ensure the highest product quality level.
The Model 150 system memory offers ECC (Error-Checking and Correcting)
fault-tolerant features. ECC corrects environment-induced single-bit
intermittent memory failures. As well as single hard failures. With ECC, the
majority of memory failures will not impact system operation. ECC also
provides double-bit memory error detection which protects data integrity in
the event of the double-bit memory failures. The system bus and PCI buses
are designed with parity error detection.
Disk mirroring and disk controller duplexing capability are provided by the AIX
operating system.
The journaled file system (JFS) of AIX operating system maintains file system
consistency and prevents data loss when the system is abnormally halted
due to a power failures.
An available RAID hardware feature external to the system provides data
integrity and fault tolerance in the event of the disk failure.
3.1.2.2 Fault Monitoring Functions
Following are the functions used to monitor faults during the boot process.
• POST (Power-on-Self Test) that checks the processor, L2 cache, memory
and associated hardware that are required for proper booting of the
operating system every time the system is powered on. If a non-critical
error is detected, or if the error(s) occur in the resources that can be
removed from the system configuration, the booting process will proceed
to completion. The error(s) are logged in the system Non Volatile RAM.
• Disk drive fault tracking is a facility that can alert the system administrator
of an impending disk failure before it impacts customer operation.
• AIX log facility where hardware and software failures are recorded and
analyzed (by the Error Log Analysis routine) to provide warning to the
system administrator on the causes of system problems. This also