High Availability | 383
Communication between RPMs
E-Series RPMs have three CPUs: Control Processor (CP), Routing Processor 1 (RP1), and Routing
Processor 2 (RP2). The CPUs use Fast Ethernet connections to communicate to each other and to the line
card CPUs (LP) using Inter-Processor Communication (IPC). The CP monitors the health status of the
other processors by sending a heartbeat message. If any CPU fails to acknowledge a consecutive number
of heartbeat messages, or the CP itself fails to send heartbeat messages (IPC timeout), the primary RPM
requests a failover to the standby RPM, and FTOS displays a message similar to Message 4.
C-Series RPMs have one CPU: Control Processor (CP). The CP on the RPM communicates with the LP
via IPC. Like the E-Series, the CP monitors the health status of the other processors by sending a heartbeat
message. If any CPU fails to acknowledge a consecutive number of heartbeat messages, or the CP itself
fails to send heartbeat messages (IPC timeout), the primary RPM requests a failover to the standby RPM,
and FTOS displays a message similar to Message 4.
In addition to IPC, the CP on the each RPM sends heartbeat messages to the CP on its peer RPM via a
process called Inter-RPM Communication (IRC). If the primary RPM fails to acknowledge a consecutive
number of heartbeat messages (IRC timeout), the standby RPM responds by assuming the role of primary
RPM, and FTOS displays message similar to message Message 5.
IPC and IRC timeouts and failover behavior
IPC or IRC timeouts can occur because heartbeat messages and acknowledgements are lost or arrive out of
sequence, or a software or hardware failure occurs that impacts IPC or IRC. Table 18-2 describes the
failover behavior for the possible failure scenarios.
Message 4 RPM Failover due to IPC Timeout
%RPM1-P:CP %IPC-2-STATUS: target rp2 not responding
%RPM0-S:CP %RAM-6-FAILOVER_REQ: RPM failover request from active peer: Auto failover on
failure
%RPM0-S:CP %RAM-6-ELECTION_ROLE: RPM0 is transitioning to Primary RPM.
%RPM0-P:CP %TSM-6-SFM_SWITCHFAB_STATE: Switch Fabric: UP
Message 5 RPM Failover due to IRC Timeout
20:29:07: %RPM1-S:CP %IRC-4-IRC_WARNLINKDN: Keepalive packet 7 to peer RPM is lost
20:29:07: %RPM1-S:CP %IRC-4-IRC_COMMDOWN: Link to peer RPM is down
%RPM1-S:CP %RAM-4-MISSING_HB: Heartbeat lost with peer RPM. Auto failover on heart beat lost.
%RPM1-S:CP %RAM-6-ELECTION_ROLE: RPM1 is transitioning to Primary RPM.
Table 18-2. Failover Behaviors
Platform Failover Trigger Failover Behavior
c e
CP task crash on the primary
RPM
The standby RPM detects the IRC time out and initiates failover, and
the failed RPM reboots itself after saving a CP application core dump.
c e
CP IRC timeout for a non-task
crash reason on the primary RPM
The standby RPM detects IRC time out and initiates failover. FTOS
saves a CP trace log, the CP IPC-related system status, and a CP
application core dump. Then the failed RPM reboots itself.