IBM 440 Server User Manual


 
22 IBM ^ xSeries 440 Planning and Installation Guide
data in the damaged DIMM is used until the system is powered down and the
DIMM replaced.
Certain restrictions exist with respect to placement and size of memory
DIMMs when memory mirroring is enabled. These are discussed in Memory
mirroring on page 67.
Chipkill memory
Chipkill is integrated into the XA-32 chipset and does not require special
Chipkill DIMMs. Chipkill corrects multiple single-bit errors to keep a DIMM
from failing. When combining Chipkill with Memory ProteXion and Active
Memory, the x440 provides very high reliability in the memory subsystem.
Chipkill memory is approximately 100 times more effective than ECC
technology, providing correction for up to four bits per DIMM (eight bits per
memory controller), whether on a single chip or multiple chips.
If a memory chip error does occur, Chipkill is designed to automatically take
the inoperative memory chip offline while the server keeps running. The
memory controller provides memory protection similar in concept to disk array
striping with parity, writing the memory bits across multiple memory chips on
the DIMM. The controller is able to reconstruct the missing bit from the failed
chip and continue working as usual.
Chipkill support is provided in the memory controller and implemented using
standard ECC DIMMs, so it is transparent to the operating system.
In addition, to maintain the highest levels of system availability, if a memory error
is detected during POST or memory configuration, the server can automatically
disable the failing memory bank and continue operating with reduced memory
capacity. You can manually re-enable the memory bank after the problem is
corrected via the Setup menu in BIOS.
Memory mirroring, Chipkill, and Memory ProteXion provide multiple levels of
redundancy to the memory subsystem. Combining Chipkill with Memory
ProteXion enables up to two memory chip failures per memory port (8 DIMMs)
on the x440. An eight-way x440 with its four memory ports could sustain up to
eight memory chip failures. Memory mirroring provides additional protection with
the ability to continue operations with memory module failures.
1. The first failure detected by the Chipkill algorithm on each port doesnt
generate a Light Path Diagnostics error, since Memory ProteXion recovers
from the problem automatically.
2. Each memory port could then sustain a second chip failure without shutting
down.
3. Provided that memory mirroring is enabled, the third chip failure on that port
would send the alert and take the DIMM offline, but keep the system running
out of the redundant memory bank.