6-18 User’s Guide
Configuring Failover and Failback Support
When an individual application or user resource (also known as a cluster resource)
fails on a cluster node, Cluster Service will detect the application failure and try to
restart the application on the cluster node. If the restart attempt reaches a preset
threshold, Cluster Service brings the running application offline, moves the application
and its resources to another cluster node, and restarts the application on the other
cluster node(s). This process of automatically moving resources from a failed cluster
node to other healthy cluster node(s) is called failover.
When the system administrator repairs and restarts the failed cluster node, the oppo-
site process occurs. After the original cluster node has been restarted and rejoins the
cluster, the Cluster Service will bring the running application and its resources offline,
move them from the failover cluster node to the original cluster node, and then restart
the application. This process of returning the resources back to their original cluster
node is called failback.
You can configure failback to occur at any given time, or not at all. However, be sure to
configure the failback time during your offpeak hours to minimize the effect on users,
as they may see a delay in service until the resources come back online.
In order to failover and failback running applications, cluster resources are placed
together in a group so the Cluster Service can move the cluster resources as a com-
bined unit. For example, an application such as Internet Information Server (IIS)
requires a virtual disk, IP address, and a network name resource. IIS also requires a
resource called “IIS Server Instance.” The IIS services and the IIS Server Instance
resource can be placed in its own group and labeled, “IIS Group” for identification.
Since the IIS Group (the resource group) contains all of the resources for the applica-
tion (IIS), Cluster Service can bring all of the necessary components online in their
proper order to ensure that failover and/or failback procedures transfers all of the user
resources as transparently as possible.
The following section provides information on failover support for 4-node clustering,
and provides tables for each failover option that includes a preferred cluster node list
for cluster group failover or failback that will help you implement your failover
configuration.
Failover Support Through Four-Node Clustering
One of the key features of Datacenter Server is that it supports a 2-node, 3-node, and
4-node failover clustering solution. The PowerEdge FE100/FL100 Datacenter Server
systems provide the 2-node to 4-node failover cluster solution and is designed to pro-
vide higher levels of availability through improved service offering and additional
cluster functionality.
When a failover situation occurs, the Cluster Service will take the resources offline
and (by default) move them to the next cluster node number. For example, if cluster
node 1 fails, Cluster Service will move the resources to the next cluster node number,
which is cluster node 2. This default type of failover is called “Cascading failover.”
After the malfunctioning cluster node is repaired and failback is enabled, Cluster Ser-
vice will failback the resources using the same procedures as failover.