20 IBM ^ xSeries 440 Planning and Installation Guide
Figure 1-11 Memory ProteXion
In the event that a chip failure on the DIMM is detected by memory scrubbing,
the memory controller can re-route data around that failed chip through the
spare bits (similar to the hot-spare drive of RAID array). It can do this
automatically without issuing a Predictive Failure Analysis (PFA) or Light Path
Diagnostics alert to the administrator. After the second DIMM failure, PFA and
Light Path Diagnostics alerts would occur on that DIMM as normal.
Memory scrubbing
Memory scrubbing is an automatic daily test of all the system memory that
detects and reports memory errors that might be developing before they
cause a server outage.
Memory scrubbing and Memory ProteXion work in conjunction with each
other, but they do not require memory mirroring (as described below) to be
enabled to work properly.
When a bit error is detected, memory scrubbing determines if the error is
recoverable or not. If it is recoverable, Memory ProteXion is enabled and the
data that was stored in the damaged locations is rewritten to a new location.
The error is then reported so that preventative maintenance can be
performed. As long as there are enough good locations to allow the proper
operation of the server, no further action is taken other than recording the
error in the error logs.
If the error is not recoverable, then memory scrubbing sends an error
message to the Light Path Diagnostics, which then turns on the proper lights
and LEDs to guide you to the defective DIMM. If memory mirroring is enabled,
then the mirrored copy of the data in the damaged DIMM is used until the
system is powered down and the DIMM replaced.
72 Bit DIMM
64 bits
Data
6 bits
ECC
2 bits
Spare