Intel SE7520JR2 Computer Hardware User Manual


 
Functional Architecture Intel® Server Board SE7520JR2
Revision 1.0
C78844-002
42
Uncorrectable memory errors are critical errors that may cause the system to fail. The BIOS
normally detects and logs these errors as IPMI SEL events for all management levels, except in
the case described below.
It is possible that a critical hardware error (uncorrectable memory or bus error) may prevent the
BIOS from running, reporting the error, and restarting the system. In Professional and Advanced
management models, the Sahalee BMC monitors the SMI signal, which, if it stays asserted for a
long period of time, is an indication that BIOS cannot run. In this case, the Sahalee BMC logs
an SMI Timeout event and probes for errors. If one is found it will log data against the IPMI type
0Ch Memory Sensor and will log against the IPMI 13h Critical Interrupt sensor for a bus error.
Both of these can include additional data in bytes 2 and 3 depending on the exact nature of the
error and what the chipset reports to the Sahalee BMC.
3.3.6 Memory RASUM Features
The Intel E7520 MCH supports several memory RASUM (Reliability, Availability, Serviceability,
Usability, and Manageability) features. These features include the Intel® x4 Single Device Data
Correction (x4 SDDC) for memory error detection and correction, Memory Scrubbing, Retry on
Correctable Errors, Integrated Memory Initialization, DIMM Sparing, and Memory Mirroring. The
following sections describe how each is supported.
Note: The operation of the memory RASUM features listed below is supported regardless of the
platform management model used. However, with no Intel® Management Module installed, the
system has limited memory monitoring and logging capabilities. It is possible for a RASUM
feature to be initiated without notification that the action has occurred when standard Onboard
Platform Instrumentation is used.
3.3.6.1 DRAM ECC – Intel® x4 Single Device Data Correction (x4 SDDC)
The DRAM interface uses two different ECC algorithms. The first is a standard SEC/DED ECC
across a 64-bit data quantity. The second ECC method is a distributed, 144-bit S4EC-D4ED
mechanism, which provides x4 SDDC protection for DIMMS that utilize x4 devices. Bits from x4
parts are presented in an interleaved fashion such that each bit from a particular part is
represented in a different ECC word. DIMMs that use x8 devices, can use the same algorithm
but will not have x4 SDDC protection, since at most only four bits can be corrected with this
method. The algorithm does provide enhanced protection for the x8 parts over a standard SEC-
DED implementation. With two memory channels, either ECC method can be utilized with equal
performance, although single-channel mode only supports standard SEC/DED.
When memory mirroring is enabled, x4 SDDC ECC is supported in single channel mode when
the second channel has been disabled during a fail-down phase. The x4 SDDC ECC is not
supported during single-channel operation outside of DIMM mirroring fail-down as it does have
significant performance impacts in that environment.
3.3.6.2 Integrated Memory Scrub Engine
The Intel E7520 MCH includes an integrated engine to walk the populated memory space
proactively seeking out soft errors in the memory subsystem. In the case of a single bit
correctable error, this hardware detects, logs, and corrects the data except when an incoming
write to the same memory address is detected. For any uncorrectable errors detected, the scrub