HP (Hewlett-Packard) A9834-9001B Server User Manual


 
Chapter 1
Overview
Server Errors
57
are opened between PDs when it is established that the PDs are up and communication between them is
open. When there is a failure in GSM, the goal is to close the sharing windows between those two cells but not
to affect sharing windows to other cells.
There are two methods to detect GSM errors. The first method is a software-only-method, in which software
wraps data with a CRC code and sequence number. Software checks this for each buffer transferred. The
second method has some hardware assistance: the hardware sets some CSR bits whenever a GSM error
occurs. Software checks the CSR bits before using the data.
Hardware Uncorrectable Errors
Hardware uncorrectable errors are detected by the hardware and signaled to software, from which software is
able to recover. For some of these errors, the hardware must behave differently to enable software recovery.
Fatal Errors
Fatal errors are unrecoverable errors that usually indicate a loss of data. The system prevents committing
corrupt data to disk or network, and logs information about the error to aid diagnosis. No software recovery of
system fatal errors is possible when a system fatal error has been detected. The goal of the sx2000 chipset and
PDC is to bring all interfaces in this PD into fatal error (FE) mode, signal an HPMC, and guarantee a clear
path to fetch PDC. PDC then saves the error logs, cleans up the error logs, and calls the OS HPMC handler.
The OS then makse a memory dump and reboot.
Blocking Timeout Fatal Errors
Blocking timeout errors occur when an interface detects that a required resource is blocked. Timeout errors
that occur when a specific transaction does not complete (TID timeouts) are not considered blocking timeout
errors. When a blocking timeout error has occurred, the interface tries to prevent queues in other interfaces,
cells, and PDs from backing up by throwing away transactions destined for the blocked resource and
returning flow control credits.
Deadlock Recovery Reset Errors
Deadlock errors are unrecoverable errors that indicate that the chipset is in a deadlock state and must be
reset to enable the CPU to fetch PDC code. Deadlock errors are caused by a defective chipset or CPU (or a
functional bug).
NOTE After the sx2000 chipset is reset, all GSM sharing regions are disabled, thus providing error
containment and preventing any corruption from spreading to other PDs.
Error Logging
Hardware error handling can be broken into four phases: detection, transaction handling, logging, and state
behavior.