Support User Manuals

Intel SE7320SP2 Computer Hardware User Manual

Open as PDF

of 175

Intel® Server Board SE7320SP2 & Intel Server Board SE7525GP2 TPS Functional Architecture

Revision 2.0

44

for the initial error detection. This has the added benefit of making uncorrectable errors in

DRAM fully correctable unless the same location in both primary and mirror happens to be

corrupt (statistically very unlikely). This RASUM feature may be enabled and disabled via

configuration.

3.5.5.4 Integrated Memory Initialization Engine

The Intel E7320 and Intel E7525 MCHs provide hardware managed ECC auto-initialization of all

populated DRAM space under software control. Once internal configuration has been updated

to reflect the types and sizes of populated DIMM devices, the MCH will traverse the populated

address space initializing all locations with good ECC. This not only speeds up the mandatory

memory initialization step, but also frees the processor to pursue other machine initialization

and configuration tasks.

Additional features have been added to the initialization engine to support high speed

population and verification of a programmable memory range with one of four known data

patterns (0/F, A/5, 3/C, and 6/9). This function facilitates a limited, very high speed memory test,

as well as provides a BIOS accessible memory zeroing capability for use by the operating

system.

3.5.5.5 DIMM Sparing Function

To provide a more fault tolerant system, the Intel E7320 MCH and Intel E7525 MCH include

specialized hardware to support fail-over to a spare DIMM device in the event that a primary

DIMM in use exceeds a specified threshold of runtime errors. One of the DIMMs installed per

channel will not be used, but kept in reserve. In the event of significant failures in a particular

DIMM, it and its corresponding partner in the other channel (if applicable), will, over time, have

its data copied over to the spare DIMM(s) held in reserve. When all the data has been copied,

the reserve DIMM(s) will be put into service and the failing DIMM will be removed from service.

Only one sparing cycle is supported. If this feature is not enabled, then all DIMMs will be visible

in normal address space.

Note: DIMM Sparing feature requires that the spare DIMM be at least the size of the largest

primary DIMM in use.

Hardware additions for this feature include the implementation of tracking register per DIMM to

maintain a history of error occurrence, and a programmable register to hold the fail-over error

threshold level. The operational model is straightforward: set the fail-over threshold register to a

non-zero value to enable the feature, and if the count of errors on any DIMM exceeds that value,

fail-over will commence. The tracking registers themselves are implemented as “leaky buckets,”

such that they do not contain an absolute cumulative count of all errors since power-on; rather,

they contain an aggregate count of the number of errors received over a running time period.

The “drip rate” of the bucket is selectable by software, so it is possible to set the threshold to a

value that will never be reached by a “healthy” memory subsystem experiencing the rate of

errors expected for the size and type of memory devices in use.

The fail-over mechanism is slightly more complex. Once fail-over has been initiated the MCH

must execute every write twice; once to the primary DIMM, and once to the spare. (This

requires that the spare DIMM be at least the size of the largest primary DIMM in use.) The MCH

will also begin tracking the progress of its built-in memory scrub engine. Once the scrub engine

has covered every location in the primary DIMM, the duplicate write function will have copied

every data location to the spare. At that point, the MCH can switch the spare into primary use,

and take the failing DIMM off-line.

previous next