HP (Hewlett-Packard) A9834-9001B Server User Manual

Open as PDF

of 247

Chapter 1

Overview

Server Errors

To support high availability (HA), the new chipset has included functionality to do error correction, detection

and recovery. Errors in the new chipset are divided into the following categories:

- Protection domain access

- Hardware correctable

- Global shared memory

- Hardware uncorrectable

- Fatal

- Blocking timeout

- Deadlock recovery errors

These categories are listed in increasing severity, ranging from protection domain (PD) access errors, which

are caused by software or hardware running in another PD, to deadlock recovery errors, which indicate a

serious hardware failure that requires a reset of the cell to recover. The term "software" refers to privileged

code, such as PDC or the OS, but not to user code. The sx2000 chipset supports the PD concept, where user

and software errors in one PD cannot affect another PD.

Protection Domain Access Errors

PD access errors are caused by transactions outside the PD that are not allowed. Packets from outside the

coherency set should not impact the interface, and some packets from within the coherency set but outside

the PD are handled as a PD access error. These errors typically occur due to a software error or to bad

hardware in another PD. These errors do not indicate a hardware failure in the reporting cell.

An example of a PD access error is an interrupt from a cell outside the PD that is not part of the interrupt

protection set. For these errors, the sx2000 chipset typically drops the transaction or converts it to a harmless

transaction, and logs the error. No error is signaled. PD access level errors themselves do not result in the

block entering No_shared mode or fatal error mode.

Hardware Corrected Errors

Hardware correctable errors are errors that can be corrected by hardware. A typical example of a hardware

correctable error is a single bit ECC error. For these errors, the sx2000 chipset corrects and logs the error. No

direct notification is given to software that an error has occurred (no LPMC is generated). For firmware or

software to detect that an error has occurred, the error logs must be read.

Global Shared Memory Errrors

Global shared memory (GSM) is a high performance mechanism for communication between separate PDs

using GNI memory without exposing your PD to hardware or software failures of the other PD. Each PD

supports eight sharing ranges. Each of these ranges is readable and writable within the PD, and

programmable to be read_only or readable writable to other PDs. Ranges of memory, called sharing windows,

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

HP (Hewlett-Packard) A9834-9001B Server User Manual