IBM 6.00E+04 Server User Manual


 
Operating System Surveillance
Operating system surveillance provides the service processor with a means to detect
hang conditions, as well as hardware or software failures, while the operating system is
running. It also provides the operating system with a means to detect a service
processor failure caused by the lack of a return heartbeat.
Operating system surveillance is not enabled by default, allowing you to run operating
systems that do not support this service processor option.
You can also use service processor menus and AIX service aids to enable or disable
operating system surveillance.
For operating system surveillance to work correctly, you must set these parameters:
v Surveillance enable/disable
v Surveillance interval
The maximum time the service processor should wait for a heartbeat from the
operating system before timeout.
v Surveillance delay
The length of time to wait from the time the operating system is started to when the
first heartbeat is expected.
Surveillance does not take effect until the next time the operating system is started after
the parameters have been set.
If desired, you can initiate surveillance mode immediately from service aids. In addition
to the three options above, a fourth option allows you to select immediate surveillance,
and rebooting of the system is not necessarily required.
If operating system surveillance is enabled (and system firmware has passed control to
the operating system), and the service processor does not detect any heartbeats from
the operating system, the service processor assumes the system is hung and takes
action according to the reboot/restart policy settings. See “Service Processor
Reboot/Restart Recovery” on page 59.
If surveillance is selected from the service processor menus which are only available at
bootup, then surveillance is enabled by default as soon as the system boots. From
service aids, the selection is optional.
66 pSeries 630 Model 6C4 and Model 6E4 User’s Guide