Support User Manuals

IBM pSeries690 Personal Computer User Manual

Open as PDF

of 212

Configuring and Deconfiguring Processors or Memory

All failures that crash the system with a machine check or check stop, even if

intermittent, are reported as a diagnostic callout for service repair. To prevent the

recurrence of intermittent problems and improve the availability of the system until a

scheduled maintenance window, processors and memory books with a failure history

are marked ″bad″ to prevent their being configured on subsequent boots.

A processor or memory book is marked ″bad″ under the following circumstances:

v A processor or memory book fails built-in self-test (BIST) or power-on self-test

(POST) testing during boot (as determined by the service processor).

v A processor or memory book causes a machine check or check stop during runtime,

and the failure can be isolated specifically to that processor or memory book (as

determined by the processor runtime diagnostics in the service processor).

v A processor or memory book reaches a threshold of recovered failures that results in

a predictive callout (as determined by the processor run-time diagnostics in the

service processor).

During boot time, the service processor does not configure processors or memory

books that are marked “bad.”

If a processor or memory book is deconfigured, the processor or memory book remains

offline for subsequent reboots until it is replaced or repeat gard is disabled. The repeat

gard function also provides the user with the option of manually deconfiguring a

processor or memory book, or re-enabling a previously deconfigured processor or

memory book. For information on configuring or deconfiguring a processor, see the

Processor Configuration/Deconfiguration Menu on page 33.

For information on configuring or deconfiguring a memory book, see the Memory

Configuration/Deconfiguration Menu on page 35. Both of these menus are submenus

under the System Information Menu.

You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the

Processor Configuration/Deconfiguration Menu.

Run-Time CPU Deconfiguration (CPU Gard)

L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache

correctable errors are monitored by the processor runtime diagnostics (PRD) code

running in the service processor. When a predefined error threshold is met, an error log

with warning severity and threshold exceeded status is returned to AIX. At the same

time, PRD marks the CPU for deconfiguration at the next boot. AIX will attempt to

migrate all resources associated with that processor to another processor and then stop

the defective processor.

Chapter 3. Using the Service Processor 55

previous next