12
When the system enables online sparing, the first ranked DIMM pair of 1A/8A are set aside as the sparing ranks,
reducing available memory. If a DIMM rank on either of the SMI buses exceeds its correctable ECC threshold, the
contents of the failing DIMM ranks go to the spare DIMM ranks. Once the copy is complete, all memory accesses to
the previous failing DIMM ranks go to the spare DIMM ranks.
During normal operation, there is no performance penalty for rank-sparing. Upon detection of a frequent error, the
performance impact occurs only during the time it takes to copy the data from the failing rank to the spare rank.
Lockstep memory mode
As Figure 4 below and Table 1 below both illustrate, each memory controller has two scalable memory interfaces
(SMI). Each SMI connects to one scalable memory buffer (SMB) and couples with two DDR3 buses. Each DDR3 bus
runs in lockstep with its DIMM pair of the other SMB. Running DIMM pairs in lockstep provides for wider error
detection and correction coverage. Through lockstep operation, the memory subsystem of the DL580 G7 server can
tolerate a x4 or a x8 DRAM device failure.
Lockstep memory mode uses two SMI links to produce a higher level of fault tolerance than the normal 64+8 bit
ECC. In lockstep mode, two channels operate as a single channel to form 16 redundant bits for each 128 data bits.
Each write and read operation moves a 144 data word. A 64-byte cache line splits across 2 DDR3 buses with a
burst length of 4. The split lines provide 2x 8-bit error detection and 8-bit error correction for a single x4 or x8
DRAM within a DIMM pair.
Using DIMM Isolation, the DL580 G7 detects and corrects errors associated with failing DIMMs that have crossed a
correctable error threshold. It detects and corrects DIMM errors caused by single x4 or x8 DRAM device failure. The
DL580 G7 identifies the individual DIMM associated with correctable errors. It also identifies the DIMM pair
associated with a failed DRAM device (detected and corrected) or with non-correctable errors (detected and
uncorrected).Mirrored memory
Memory Sparing cannot correct errors that evade correction by the ECC or SDDC. By providing added redundancy
in the memory subsystem, Memory Mirroring delivers the greatest protection against memory failure beyond ECC,
SDDC, and Memory Sparing.
In the mirrored mode, each lockstep DIMM pair in a memory controller has a mirrored DIMM pair on the other
memory cartridge.
After the DL580 G7 detects an uncorrectable memory error from a DIMM pair of a memory cartridge, the server
avoids a system crash by reading the mirrored DIMM pairs from the other memory cartridge. In this case, the system
management disables the failed DIMM. Later memory reads and writes will occur only on the mirrored DIMM pairs.
Note that in Memory Mirroring mode, useable memory capacity is half of the available memory, and the perceived
available memory bandwidth is about half of available memory bandwidth.
For population guidelines, see the DL 580 G7 User Guide at
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02267159/c02267159.pdf.
Figure 4 displays the server’s memory expansion architecture.