IBM P5 570 Server User Manual


 
Chapter 3. Capacity on Demand, RAS, and manageability 59
If the output shows CPU Guard as disabled, enter the following command to enable it:
chdev -l sys0 -a cpuguard='enable'
Cache or cache-line deallocation is aimed at performing dynamic reconfiguration to bypass
potentially failing components. This capability is provided for both L2 and L3 caches. Dynamic
run-time deconfiguration is provided if a threshold of L1 or L2 recovered errors is exceeded.
In case of an L3 cache run-time array single-bit solid error, the spare chip resources are used
to perform a L3 cache line delete on the failing line.
PCI hot-plug slot fault tracking helps prevent slot errors from causing a system machine check
interrupt and subsequent reboot. This provides superior fault isolation, and the error affects
only the single adapter. Run-time errors on the PCI bus that are caused by failing adapters
will result in recovery action. If this is unsuccessful, the PCI device will be gracefully shut
down. Parity errors on the PCI bus itself will result in bus retry and, if uncorrected, the bus and
any I/O adapters or devices on that bus will be deconfigured.
The p5-570 supports PCI Extended Error Handling (EEH) if it is supported by the PCI-X
adapter. In the past, PCI bus parity errors caused a global machine check interrupt, which
eventually required a system reboot in order to continue. In the p5-570 system, hardware,
system firmware, and AIX interaction has been designed to allow transparent recovery of
intermittent PCI bus parity errors and graceful transition to the I/O device available state in the
case of a permanent parity error in the PCI bus.
EEH-enabled adapters respond to a special data packet that is generated from the affected
PCI slot hardware by calling system firmware, which examines the affected bus, allows the
device driver to reset it, and continues without a system reboot.
Persistent deallocation functions include:
Processor
Memory
Deconfigure or bypass failing I/O adapters
L3 cache
Following a hardware error that has been flagged by the service processor, the subsequent
reboot of the system invokes extended diagnostics. If a processor or L3 cache has been
marked for deconfiguration by persistent processor deallocation, the boot process will attempt
to proceed to completion with the faulty device automatically deconfigured. Failing I/O
adapters will be deconfigured or bypassed during the boot process.
3.2.8 Serviceability
By increasing service productivity, the system is up and running for a longer time. p5-570
improves service productivity by providing the following functions.
Error indication and LED indicators
The p5-570 is designed to be installed by an IBM service representative. The addition of most
hardware features after the install is customer setup. To help the customer and the IBM
service representative, the p5-570 provides internal LED diagnostics that identify parts that
require service. Indication of an error is provided through a series of light attention signals,
Note: The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable software error, software hang, hardware failure, or
environmentally induced failure (such as loss of power supply).