Intel MPCMM0001 Network Card User Manual


 
MPCMM0001 Chassis Management Module Software Technical Product Specification 49
Process Monitoring and Integrity
6.7.6 Failed Failover/Reboot Recovery, Critical
In this scenario, PMS is running on the active CMM and detects a monitored process fault. The
severity of the process is configured to be critical. The configured recovery action is: failover to the
standby CMM and upon successfully executing the failover, reboot the now standby CMM. The
failover recovery action is unsuccessful (standby is not available, etc.). The process being
monitored is of a critical severity and therefore the reboot of the CMM will be performed.
Table 10. Failed Failover/Reboot Recovery, Non-Critical
Description Event String UID Assert Severity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
# Assert Configure
The recovery action specified is
"failover & reboot"
Attempting failover & reboot
recovery action
# N/A Configure
PMS executes a failover
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
-N/A N/A
PMS detects that it is still running on
the active CMM. The process is not
critical and therefore the reboot
operation will not be performed.
Failover & reboot recovery failure # N/A Configure
No attempt will be made to recover
the process. The PMS will stop
monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable
monitoring and de-assert the event.
Process existence fault;
monitoring disabled or
Thread watchdog fault; monitoring
disabled or
Process integrity fault; monitoring
disabled
# Assert Configure