NEC 5800/1000 Personal Computer User Manual


 
10
Mainframe-class RAS Features
Enhanced error detection of the high-speed interconnect
Intricate error handling through multi-bit error detection
and resending of errored data
Since higher speed interconnects are implemented to increase
system performance, there are higher probabilities that
interference noise will cause errors occurring along these
interconnects. One method of handling these interconnect errors
would be to disable the errored interconnect and operate in a
degradated mode.
In addition to above method, the Expres5800/1000 series servers
have implemented a methodology prevalent in supercomputers,
where by intricate multi-bit error detection is carried out, and
errored data is resent upon detection of an error. This allows
the Express5800/1000 series servers to handle the intermittent
errors which occur along the high-speed interconnects, without
impacting the system performance.
Two independent power sources
Avoid system shutdown due to failures of the power distribution units
The previous 32 processor and the 16 processor models supported
having two independent power supplies, where the 8 processor
model did not. This feature is now available on the new 8 processor
system (1080Rf) so that the system can continue operations even
in the event of a failure with in the power distribution unit.
Autonomic reporting of error logs with pinpoint prognosis
of failed components
Realization of a mainframe-class platform serviceability
The Express5800/1000 series servers are equipped with a service
processor which process server management and platform error
handling. The service processor can be considered the core
component which supports the RAS features of the system. One
feature of the service processor is its ability to analyze detail logs
(BID: built-in diagnosis) which are collected by the chipset in the
event of an error. The BID is able to diagnose the location of the
error, and will pinpoint the required FRU (Field Replaceable Unit)
so that the time required to replace the component and recover the
system, can be minimized.
In the event of a failure, the Express5800/1000 series servers
also have the capability to automatically send detailed error logs
to maintenance personnel, enabling us to further lessen the time
required to resolve a system error. Furthermore, to minimize
the possibility of a critical error, the diagnostics engine is able to
proactively predict errors rather than just react to errors.
Implementation of an Uninterruptible Power Supply (UPS) can
further increase availability. The two independent power source
feature is a standard feature on the 1320Xf and is available as an
optional feature for 1160Xf and 1080Rf.
Customer
Environment
Diagnostics Agent
Diagnostics of retry tendency and
confirmation of whether threshold
was exceeded
Service
Processor
Manager
Preventive Maintenance,
Failed Component Replacement
Maintenance Group
The error information summary
is analyzed to determine the
cause of the failure.
The development team may
be contacted for assistance.
Encrypted message
Development Group
The Error information
is sent via email
If required, the detail log is analyzed
further by the development groups
Hard
ware
Diagnostics
Agent
Log
Mail
Log
Mail
Internet
Log
A detailed hardware error log
including transaction history is
collected.
chipset
Without Check Features
Logic Circuits
ECC
Failure
Bad Data
Without Check Features
Logic Circuits
ECC
Data
Data
Failure
Unable to detect error
Circuit
Check
Error Detected
1 bit Error
Error Detection
Circuits
Error Detection
Circuits
Bad data, resulting from a simple error
such as a single bit error, can not be
blocked if a failure exists within the
error detection circuits themselves.
Diagnostics of the error detection
circuits at every system boot
insures data integrity.
Error
Reporting
Error
Reporting