Intel MPCMM0001 Network Card User Manual


 
MPCMM0001 Chassis Management Module Software Technical Product Specification 53
Process Monitoring and Integrity
recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of
critical severity and therefore the reboot of the CMM will still be executed even though the CMM
is still active.
6.7.11 Process Administrative Action
In this scenario, PMS has detected a fault in a process, but has not been able to recover the process
(recovery is configured for no action, etc.). This causes PMS to operationally disable monitoring of
the process. To re-enable monitoring of the process, an operator must administratively lock the
process, take the necessary actions to fix the process, and administratively unlock the process.
Table 15. Excessive Restarts, Failed Escalate Failover/Reboot, Critical
Description Event String UID Assert Severity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
# Assert Configure
The recovery action specified is
"restart process"
Attempting process restart
recovery action
# N/A Configure
PMS detects that the process has
been restarted excessively.
Recovery failure due to excessive
restarts
# N/A Configure
The escalated recovery action
specified is "failover and reboot"
Attempting failover & reboot
escalated recovery action
# N/A Configure
PMS executes a failover.
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
-N/A N/A
PMS detects that it is still running on
the active CMM. The process is
critical and therefore the reboot
operation is performed.
Upon initialization of PMS after the
reboot. The monitor will de-assert the
event.
Monitoring initialized # De-assert OK
Table 16. Administrative Action
Description Event String UID Assert Severity
Operator administratively locks
monitoring of the process
None - N/A N/A
Operator takes actions to fix the
problem
N/A - N/A N/A
Operator administratively unlocks
monitoring of the process causing
monitoring to restart
Monitoring initialized # De-assert OK