IBM SG24-4576-00 Server User Manual


 
1.4 Memory Error Detection and Correction
IBM PC servers implement four different memory systems:
Standard (parity) memory
Error Correcting Code-Parity
Error Correcting Code (ECC) memory
ECC Memory on SIMMs (EOS) Memory
1.4.1 Standard (Parity) Memory
Parity memory is standard IBM memory with 32 bits of data space and 4 bits of
parity information (one check bit/byte of data). The 4 bits of parity information
are able to tell you an error has occurred but do not have enough information to
locate which bit is in error. In the event of a parity error, the system generates a
non-maskable interrupt (NMI) which halts the system. Double bit errors are
undetected with parity memory.
Standard memory is implemented in the PC Servers 300 and 320 as well as in
the majority of the IBM desktops (for example the IBM PC 300, IBM PC 700, and
PC Power Series 800).
1.4.2 Error Correcting Code (ECC)
The requirements for system memory in PC servers has increased dramatically
over the past few years. Several reasons include the availability of 32 bit
operating systems and the caching of hard disk data on file servers.
As system memory is increased, the possibility for memory errors increase.
Thus, protection against system memory failures becomes increasingly
important. Traditionally, systems which implement only parity memory halt on
single-bit errors, and fail to detect double-bit errors entirely. Clearly, as memory
is increased, better techniques are required.
To combat this problem, the IBM PC servers employ schemes to detect and
correct memory errors. These schemes are called Error Correcting Code (or
sometimes Error Checking and Correcting but more commonly just ECC). ECC
can detect and correct single bit-errors, detect double-bit errors, and detect
some triple-bit errors.
ECC works like parity by generating extra check bits with the data as it is stored
in memory. However, while parity uses only 1 check bit per byte of data, ECC
uses 7 check bits for a 32-bit word and 8 bits for a 64-bit word. These extra
check bits along with a special hardware algorithm allow for single-bit errors to
be detected and corrected in real time as the data is read from memory.
Figure 5 on page 10 shows how the ECC circuits operate. The data is scanned
as it is written to memory. This scan generates a unique 7-bit pattern which
represents the data stored. This pattern is then stored in the 7-bit check space.
Chapter 1. IBM PC Server Technologies 9