Intel® 460GX Chipset Software Developer’s Manual 6-1
Data Integrity and Error Handling 6
6.1 Integrity
This chapter explains the various errors in the chipset. Error handling requires catching the error,
containing it, notifying the system, and recovery or system restart. Different platforms have
different requirements for error handling. A server is most interested in containment. It wants bad
data to be stopped before it reaches the network or the disk. On the other hand workstations with
graphics may be less interested in containment. If the screen blips for one frame and the blip is
gone in the next frame, the error is transient, and may not even be noticed.
The 460GX chipset will attempt to accommodate both philosophies. It will allow certain errors to
be masked off, or will turn them into simple interrupts instead of fatal errors. Fatal errors are those
which require a re-boot, e.g. BINIT#. Some errors will always be fatal, such as protocol errors or
when the chipset has lost synchronization of queues or events. The user (OEM, O.S.) can decide
the behavior for data errors. These may be considered as fatal, for maximum containment, or they
may simply be reported as an interrupt while the system continues as best it can. If the data is
moving to graphics, then an error may be unnoticed. It is possible that data entering memory as bad
is never used, and therefore never shows up as an error to any user.
Each error will not be individually maskable. In general there are only 2 modes - aggressive and
non-aggressive. In aggressive mode, every error - parity, protocol, queue management - will be
considered fatal and lead to a BINIT#. In the non-aggressive mode, many errors will be reported as
interrupts and not cause BINIT#. Even in non-aggressive mode, when the chipset has certain errors
and doesn’t know what to do with a transaction or seems out of sync across the chips, it will
BINIT#.
The chipset will report errors at their use, instead of their generation. Both the processor and the
chipset may ‘poison’ data. If the processor has an internal cache error, it may write out the data
with bad ECC. If the chipset has bad parity on I/O data, it will corrupt the data as it is passed along.
In both cases the data will be put in memory with bad ECC. If it isn’t used, then no error is
reported. If it is used, then the error is found at that point.
The 460GX chipset will isolate the error reporting as close to the error itself as possible. In some
cases this can be to a failing DRAM or PCI card. In others it will be for a PCI bus or Expander port.
6.1.1 System Bus
• The 460GX chipset provides ECC generation on data delivered to the system bus, and ECC
checking of data accepted from the system bus. Single-bit errors are corrected; multi-bit errors
will write the data with bad ECC into the DRAM’s (poisoned data) or to I/O with bad parity.
• Parity bits are generated and checked independently for the system bus address lines, the
system bus request group, and the system bus response group. Errors typically result in the
assertion of BINIT#.
• A variety of system bus protocol errors are also detected, and will result in assertion of
BINIT#.
• The first instance of a bus error is logged with the address and error type. Additional status
flags indicate subsequent errors occurred.
• For I/O accesses, good ECC is always generated for data with no parity errors. For data with
bad parity, the data is poisoned with bad ECC as it’s returned to the processor.