58 p5-570 Technical Overview and Introduction
(dynamic bit-steering). Memory scrubbing is the process of reading the contents of the
memory during idle time and checking and correcting any single-bit errors that have
accumulated by passing the data through the ECC logic. This function is a hardware function
on the memory controller chip and does not influence normal system memory performance.
3.2.5 N+1 redundancy
The use of redundant parts allows the p5-570 to remain operational with full resources:
Redundant spare memory bits in L1, L2, L3, and main memory
Redundant fans
Redundant power supplies
3.2.6 Fault masking
If corrections and retries succeed and do not exceed threshold limits, the system remains
operational with full resources and no client or IBM customer engineer intervention is
required:
CEC bus retry and recovery
PCI-X bus recovery
ECC Chipkill soft error
3.2.7 Resource deallocation
If recoverable errors exceed threshold limits, resources can be deallocated with the system
remaining operational, allowing deferred maintenance at a convenient time.
Dynamic or persistent deallocation
Dynamic deallocation of potentially failing components is non-disruptive, allowing the system
to continue to run. Persistent deallocation occurs when a failed component is detected, which
is then deactivated at a subsequent reboot.
Dynamic deallocation functions include:
Processor
L3 cache line delete
Partial L2 cache deallocation
PCI-X bus and slots
For dynamic processor deallocation, the service processor performs a predictive failure
analysis based on any recoverable processor errors that have been recorded. If these
transient errors exceed a defined threshold, the event is logged and the processor is
deallocated from the system while the operating system continues to run. This feature
(named
CPU Guard) enables maintenance to be deferred until a suitable time. Processor
deallocation can occur only if there are sufficient functional processors (at least two).
To verify whether CPU Guard has been enabled, run the following command:
lsattr -El sys0 | grep cpuguard
If CPU Guard is enabled, the output will be similar to:
cpuguard enable CPU Guard True