Intel 460GX Computer Hardware User Manual


 
Intel® 460GX Chipset Software Developers Manual 6-3
Data Integrity and Error Handling
6.2 Memory ECC Routing
The ECC code used in DRAM is the same code as used in the Itanium processor, requiring 8 check
bits to cover 64 bits of data. On the system bus, this code detects and corrects all single-bit errors,
and detects double-bit errors.
The system designer has the option of wiring the boards such that the following is true:
Using x4 DRAMs, multiple errors within one chip are 100% corrected
Using x8 chips, all errors within a single chip are 100% detected
This is done by wiring the board so that each x4 DRAM has one bit in each of the 4 ECC words of
a half-line. Since a half-line is 256 bits and the ECC is on 64 bits, there are 4 ECC words per half-
line. For x8 chips, the bits are sliced across the 4 words, so that at most 2 bits from any one chip are
in one ECC word. The ECC used on the processor will detect all 4-bit nibble errors.
6.3 Data Poisoning
When data is received that is uncorrectable, it will be passed on to the next interface as poisoned.
The data may have come from memory or from the system bus with uncorrectable ECC errors. All
data passes through the data buffer in the SDC. As uncorrectable data is placed in the data buffer it
is marked that it was received as bad. When the data is read out of the data buffer and sent on, then
the parity or ECC generated will be deliberately forced bad. Data is checked on a chunk
boundary, with a chunk being 64 bits of data.
Data to the system bus or to DRAM will have 2 bits of ECC corrupted for each failed chunk of
data. These are bits 0 and 1 of the ECC bits, or bits 63 and 71 if looking at the entire 72 bits of data/
ECC. Data passed to the private data bus will invert all the calculated parity bits associated with the
failing chunk, thus passing bad parity to the private data bus.
6.4 Usage of First-error and Next-error
The first instance of an error is latched in the first-error status register (FERR). The first error does
NOT set the bit in the next-error register (NERR). When an error is found, it is latched into the
FERR if the FERR has no other bit set. If any bit is already set, then the appropriate bit in NERR is
set.
Since the system needs to know if only one error has occurred or many, setting the FERR does not
set the NERR. If there is another error of any type, including a second occurrence of the first-error,
then the NERR is set. Software can read both FERR and NERR. If FERR is set but NERR is not,
then only one error occurred in the system. If both are set, then multiple errors have occurred.
For the first error, as much information as possible is captured. The data, address and command
information is captured if available. This allows isolation of errors and possible recovery.
In the case of 2 errors occurring in the same cycle, then 2 bits may be set in FERR. This should be
a rare case. The other exception is for FERR_SAC. If there is a single-bit correctable ECC error
from DRAM, then bit SCME will be set. This bit will not block other bits in FERR_SAC from
being set. This allows software to poll periodically looking for single bit errors while not
preventing other errors from being logged. Other than these two conditions, there should never be
more than one bit set in any FERR.