HP (Hewlett-Packard) A9834-9001B Server User Manual

Open as PDF

of 247

Chapter 1

Overview

CPUs and Memories

Memory Error Protection

All of the CC cache lines are protected in memory by an error correction code (ECC). The sx2000 memory ECC

scheme is significantly different from the sx1000 memory ECC scheme. An ECC code word is contained in

each pair of 144-bit chunks. The memory data path (MDP) block is responsible for checking for and, if

necessary, correcting any correctable errors.

DRAM Erasure

A common cause of a correctable memory error is a DRAM failure, and the ability to correct this type of

memory failure in hardware is sometimes known as chip kill. Address or control bit failure is a common

cause. Chip kill ECC schemes have added hardware logic that allows them to detect and correct more than a

single data bit error when the hardware is programmed to do so. A common implementation of traditional

chip kill is to scatter data bits from each DRAM component across multiple ECC codewords, such that only

one bit from each DRAM is used per ECC codeword.

Double chip kill is an extension to memory chip kill that enables the system to correct multiple ECC errors in

an ECC code word. HP labs developed the ECC algorithm and the first implementation of this technology is in

platforms using the sx2000 chipset. Double chip kill is also known as DRAM erasure.

DRAM erasure is invoked when the number of correctable memory errors exceeds a threshold and can be

invoked on a memory subsystem, bus, rank or bank. PDC tracks the errors that are seen on a memory

subsystem, bus, rank and bank in addition to the error information it tracks in the PDT.

PDC Functional Changes

There are three primary threads of control in the processor dependant code (PDC): the bootstrap, the errors

code, and the PDC procedures. The bootstrap is the primary thread of control until the OS is launched. The

boot console handler (BCH) acts as a user interface for the bootstrap, but can also be used to diagnose

problems with the system by HP support.

The PDC procedures are the primary thread of control once the OS has launched. Once the OS has launched,

the PDC code is only active when the OS calls a PDC procedure or there is an error that causes the error code

to be called.

If a correctable memory error occurs during run time, the new chipset logs the error and corrects it in memory

(reactive scrubbing). Diagnostics periodically read memory module states to read the errors logs. When this

PDC call is made, system firmware updates the PDT, and deletes entries older than 24 hours in the structure

that counts how many errors have occurred for each memory subsystem, bus, rank or bank. When the counts

exceed the thresholds, PDC will invoke DRAM erasure on the appropriate memory subsystem, bus, rank or

bank. Invoking DRAM erasure does not interrupt the operation of the OS.

When PDC invokes DRAM erasure, the information returned by reading memory module states indicate the

scope of the invocation and provides information to allow diagnostics to determine why it was invoked. PDC

also sends IPMI events indicating that DRAM erasure is in use. When PDC invokes DRAM erasure, the

correctable errors that caused DRAM erasure are removed from the PDT. Because invoking DRAM erasure

increases the latency of memory accesses and reduces the ability of ECC to detect multi bit errors, it is

important to notify the customer that the memory subsystem needs to be serviced. HP recommends that the

memory subsystem is serviced within a month of invoking DRAM erasure on a customer machine.

The thresholds for invoking DRAM erasure are incremental so that PDC invokes DRAM erasure on the

smallest part of memory subsystem necessary to protect the system against a another bit being in error.

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

HP (Hewlett-Packard) A9834-9001B Server User Manual