Chapter 2 System Features and Capabilities 2-9
■ Periodically performs memory patrol to detect memory software errors and
stuck-at faults, even in memory areas not normally used (Memory patrol).
Memory patrol prevents faulty areas from being used and thereby prevents the
occurrence of system failures.
■ Keeps checking the status of each component to detect signs of an imminent fault,
such as system down occurrences. Prevents system failures (Status checking of
components).
2.4.2 Availability
Availability represents the ratio of time the midrange server is accessible and usable.
An operating ratio is used as an index.
Faults cannot be completely eliminated. To provide high availability, the system must
be incorporated with mechanisms that enable continuous system operation even if a
failure occurs in hardware, such as components and devices, basic software such as
the operating system, or business application software.
The midrange servers can provide high availability by implementing the items listed
below. Also, a cluster configuration can provide higher availability.
■ Supporting redundant configurations and active replacement of power supplies
and fans.
■ Supporting redundant configurations, mirroring, and active replacement of disks.
■ Extending the range of automatic correction of temporary faults in memory,
system buses, and LSI internal data.
■ Support of an enhanced retry function and degradation function for detected
faults.
■ Shortening the downtime by using automatic system reboot.
■ Shortening the time taken for system startup.
■ XSCF collection of fault information, and preventive maintenance using different
types of warnings.
■ Supporting the extended error checking and control function in the memory
subsystem. The memory extended error checking and control function is an ECC
code that enables correction of data from a 4-bit nibble error caused when an
entire DRAM chip fails. This feature works for DIMMs employing x4 I/O DRAM.
■ Supporting the memory mirroring function enables normal data processing
through the other memory bus, thereby preventing system failures, in response to
a DIMM stuck fault in the same memory bus.
Since the memory patrol facility is implemented in hardware, it is not affected by the
software processing workload.