IBM DS8000 Computer Drive User Manual

Open as PDF

of 450

Chapter 4. RAS 65

generate Early Power-Off Warning (EPOW) events. Critical events (for example, a Class 5 AC

power loss) trigger appropriate signals from hardware to the affected components to prevent

any data loss without operating system or firmware involvement. Non-critical environmental

events are logged and reported using Event Scan. The operating system cannot program or

access the temperature threshold using the SP.

Temperature monitoring is also performed. If the ambient temperature goes above a preset

operating range, then the rotation speed of the cooling fans can be increased. Temperature

monitoring also warns the internal microcode of potential environment-related problems. An

orderly system shutdown will occur when the operating temperature exceeds a critical level.

Voltage monitoring provides warning and an orderly system shutdown when the voltage is out

of operational specification.

Self-healing

For a system to be self-healing, it must be able to recover from a failing component by first

detecting and isolating the failed component. It should then be able to take it offline, fix or

isolate it, and then reintroduce the fixed or replaced component into service without any

application disruption. Examples include:

 Bit steering to redundant memory in the event of a failed memory module to keep the

server operational

 Bit scattering, thus allowing for error correction and continued operation in the presence of

a complete chip failure (Chipkill™ recovery)

 Single-bit error correction using ECC without reaching error thresholds for main, L2, and

L3 cache memory

 L3 cache line deletes extended from 2 to 10 for additional self-healing

 ECC extended to inter-chip connections on fabric and processor bus

 Memory scrubbing to help prevent soft-error memory faults

 Dynamic processor deallocation

Memory reliability, fault tolerance, and integrity

The p5 570 uses Error Checking and Correcting (ECC) circuitry for system memory to correct

single-bit memory failures and to detect double-bit. Detection of double-bit memory failures

helps maintain data integrity. Furthermore, the memory chips are organized such that the

failure of any specific memory module only affects a single bit within a four-bit ECC word

(bit-scattering), thus allowing for error correction and continued operation in the presence of a

complete chip failure (Chipkill recovery).

The memory DIMMs also utilize memory scrubbing and thresholding to determine when

memory modules within each bank of memory should be used to replace ones that have

exceeded their threshold of error count (dynamic bit-steering). Memory scrubbing is the

process of reading the contents of the memory during idle time and checking and correcting

any single-bit errors that have accumulated by passing the data through the ECC logic. This

function is a hardware function on the memory controller chip and does not influence normal

system memory performance.

N+1 redundancy

The use of redundant parts, specifically the following ones, allows the p5 570 to remain

operational with full resources:

 Redundant spare memory bits in L1, L2, L3, and main memory

 Redundant fans

 Redundant power supplies

previous next

Top Automotive Device Types

Top Automotive Brands

Top Baby Care Device Types

Top Baby Care Brands

Top Car Audio & Video Device Types

Top Car Audio & Video Brands

Top Cellphone Device Types

Top Cellphone Brands

Top Communications Device Types

Top Communications Brands

Top Computer Device Types

Top Computer Brands

Top Fitness Device Types

Top Fitness Brands

Top Home Audio Device Types

Top Home Audio Brands

Top Household Appliance Device Types

Top Household Appliance Brands

Top Kitchen Appliance Device Types

Top Kitchen Appliance Brands

Top Laundry Appliance Device Types

Top Laundry Appliance Brands

Top Lawn & Garden Device Types

Top Lawn & Garden Brands

Top Marine Equipment Device Types

Top Marine Equipment Brands

Top Musical Instrument Device Types

Top Musical Instrument Brands

Top Outdoor Cooking Device Types

Top Outdoor Cooking Brands

Top Personal Care Device Types

Top Personal Care Brands

Top Photography Device Types

Top Photography Brands

Top Portable Media Device Types

Top Portable Media Brands

Top Power Tools Device Types

Top Power Tools Brands

Top TV and Video Device Types

Top TV and Video Brands

Top Videogame Device Types

Top Videogame Brands

IBM DS8000 Computer Drive User Manual