IBM SG24-4576-00 Server User Manual


 
Figure 5. ECC Memory Operation
As the data is read from memory, the ECC circuit again performs a scan and
compares the resulting pattern to the pattern which was stored in the check bits.
If a single-bit error has occurred (the most common form of error), the scan will
always detect it, automatically correct it and record its occurrence. In this case,
system operation will not be affected.
The scan will also detect all double-bit errors, though they are much less
common. With double-bit errors, the ECC unit will detect the error and record its
occurrence in NVRAM; it will then halt the system to avoid data corruption. The
data in NVRAM can then be used to isolate the defective component.
In order to implement an ECC memory system, you need an ECC memory
controller and ECC SIMMs. ECC SIMMs differ from standard memory SIMMs in
that they have additional storage space to hold the check bits.
The IBM PC Servers 500 and 720 have ECC circuitry and provide support for ECC
memory SIMMs to give protection against memory errors.
1.4.3 Error Correcting Code-Parity Memory (ECC-P)
Previous IBM servers such as the IBM Server 85 were able to use standard
memory to implement what is known as ECC-P. ECC-P takes advantage of the
fact that a 64-bit word needs 8 bits of parity in order to detect single-bit errors
(one bit/byte of data). Since it is also possible to use an ECC algorithm on 64
bits of data with 8 check bits, IBM designed a memory controller which
implements the ECC algorithm using the standard memory SIMMs.
10 NetWare Integration Guide