Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 4 Designing a Continental Cluster

Restoring Disaster Tolerance

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are:

  • Restoring the failed cluster

    Depending on the nature of the disaster you may need to create a new cluster, or you may be able to restore the cluster. Steps for each scenario are discussed in the following sections.

    Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled. This is to prevent starting the packages unexpectedly with the cluster.

  • Resynchronizing the data

    To resynchronize the data, you either restore the data to the cluster and continue with the same data replication procedure, or set up data replication to function in the other direction.

The following sections briefly outline some scenarios for restoring disaster tolerance.

Restore Clusters to their Original Roles

If the disaster did not destroy the cluster, you can return both clusters in a recovery pair to their original roles. To do this:

  1. Make sure that both clusters are up and running, with the recovery packages continuing to run on the surviving cluster.

  2. On each cluster, stop the Continentalclusters monitor package if it is still running:

    # cmhaltpkg ccmonpkg

  3. Compare the clusters to make sure their configurations are consistent. Correct any inconsistencies.

  4. For each recovery group where the new cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving cluster to the disks on the new cluster. This may be time-consuming.

    2. Halt the recovered application on the surviving cluster if necessary, and start it on the new cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. Restart the monitor using the following command on each cluster:

    # cmrunpkg ccmonpkg

    Alternatively, if you have modified the monitoring package configuration, use the following sequence on each cluster to apply the new configuration and start the monitor:

    # cmapplyconf -P ccmonpkg.config

    # cmmodpkg -e ccmonpkg

  6. View the status of the Continentalcluster.

    # cmviewconcl

Primary Packages Remaining on the Surviving Cluster

Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster as a primary-only cluster. This minimizes the downtime involved with moving the applications back to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle running all critical applications indefinitely.

NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured to share the same recovery cluster, the following procedure to switch the role of the failed cluster and the surviving cluster should not be used.

Use the following:

  1. Halt the monitor packages. Issue the following command on each cluster:

    # cmhaltpkg ccmonpkg

  2. Edit the Continentalclusters ASCII configuration file. You will need to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. You may also need to re-create data sender and data receiver packages.

  3. Check and apply the Continentalclusters configuration:

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. Restart the monitor packages. Issue the following command on each cluster:

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster.

    # cmmviewconcl

Before applying the edited configuration, the data storage associated with each cluster needs to be prepared to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

Primary Packages Remaining on the Surviving Cluster using cmswitchconcl

Continentalclusters provides the command cmswitchconcl to facilitate steps two and three described in the section “Primary Packages Remaining on the Surviving Cluster”. The command cmswitchconcl is used to switch the roles of primary and recovery packages of the Continentalclusters recovery groups for which the specified cluster is defined as the primary cluster. The cmswitchconcl command should not be used in a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster. Otherwise, the command will fail.

NOTE: Before running the cmswitchconcl command, the data storage associated with each cluster needs to be prepared properly to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified.

To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures:

  1. Halt the monitor package on each cluster:

    # cmhaltpkg ccmonpkg

  2. Run:

    # cmswitchconcl \

    -C currentContinentalclustersConfigFileName \

    -c oldPrimaryClusterName \

    [-a] [-F NewContinentalclustersConfigFileName]

    The above command switches the roles of the primary and recovery packages of the Continentalclusters recovery groups for which “oldPrimaryClusterName” is defined as the primary cluster.

    The default values of monitoring package name (ccmonpkg) and interval (60 seconds), and notification scheme (SYSLOG) with notification delay (0 seconds) will be added for cluster “OldPrimaryClusterName”, which will serve as the recover-only cluster. If editing of the default values are desired, you may do it with file “NewContinentalclusterConfigFileName” if -F is specified, or with file “CurrentContinentalclustersConfigFileName” if -F is not specified. If editing of the new configuration file is needed, you should not use -a option. If option -a is specified the new configuration will be applied automatically.

  3. If option -a is specified with cmswitchconcl in step 2, skip this step. Otherwise manually apply the new Continentalclusters configuration:

    # cmapplyconcl -v -c newContinentalclustersConfigFileName (if -F is specified in step 2)

    # cmapplyconcl -v -c CurrentContinentalclusterConfigFileName (if -F is not specified in step 2)

  4. Restart the monitor packages, issue the following command on each cluster:

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster:

    # cmviewconcl

The cmswitchconcl command can also be used to switch the package role of a recovery group. If only a subset of the primary packages will remain running on the surviving (recovery) cluster, a new option -g is provided with the cmswitchconcl command. This option reconfigures the roles of the packages of a recovery group and helps retain recovery protection after a failover.

Usage of option -g (recovery group based role switch reconfiguration) is the same as the one for -c (cluster based role switch reconfiguration). However, option -c and -g of the cmswitchconcl command are mutually exclusive.

# cmswitchconcl \

-C currentContinentalclustersConfigFileName \

-g RecoverGroupName \

[-a] [-F NewContinentalclustersConfigFileName]

The following is a sample of input and output files for running cmswitchconcl -C sample.input -c clusterA -F Sample.out

sample.input============### Section 1. Cluster Information
CONTINENTAL_CLUSTER_NAME        Sample_CC_ClusterCLUSTER_NAME                    ClusterA        CLUSTER_DOMAIN        			cup.hp.com        NODE_NAME               node1        NODE_NAME               node2        MONITOR_PACKAGE_NAME    ccmonpkgCLUSTER_NAME                    ClusterBCLUSTER_DOMAIN                cup.hp.com        NODE_NAME               node3        NODE_NAME               node4        MONITOR_PACKAGE_NAME    ccmonpkgMONITOR_INTERVAL        60 SECONDS
### Section 2.  Recovery GroupsRECOVERY_GROUP_NAME               RG1        PRIMARY_PACKAGE                 ClusterA/pkgX        RECOVERY_PACKAGE                ClusterB/pkgX'RECOVERY_GROUP_NAME                RG2        PRIMARY_PACKAGE                ClusterA/pkgY        RECOVERY_PACKAGE               ClusterB/pkgY'        DATA_RECEIVER_PACKAGE          ClusterB/pkgR1RECOVERY_GROUP_NAME                RG3        PRIMARY_PACKAGE                ClusterB/pkgZ        RECOVERY_PACKAGE               ClusterA/pkgZ'
RECOVERY_GROUP_NAME                RG4        PRIMARY_PACKAGE                  ClusterB/pkgW        RECOVERY_PACKAGE               ClusterA/pkgW'        DATA_RECEIVER_PACKAGE          ClusterA/pkgR2
### Section 3.  Monitoring DefinitionsCLUSTER_EVENT   ClusterA/DOWN        MONITORING_CLUSTER        ClusterB        CLUSTER_ALERT             60 SECONDS
NOTIFICATION      TEXTLOG /home/user/logs/events.log                “CC alert: DOWN”                NOTIFICATION      SYSLOG                “CC alert: DOWN”        CLUSTER_ALARM             90 SECONDSNOTIFICATION      TEXTLOG /home/users/logs/events.log                “CC alarm: DOWN”                NOTIFICATION      SYSLOG                “CC alarm: DOWN”
sample.output### Section 1.  Cluster Information                CONTINENTAL_CLUSTER_NAME        Sample_CC_ClusterCLUSTER_NAME                    ClusterA        CLUSTER_DOMAIN        cup.hp.com        NODE_NAME               node1        NODE_NAME               node2        MONITOR_PACKAGE_NAME    ccmonpkg        MONITOR_INTERVAL        60 SECONDSCLUSTER_NAME                    ClusterBCLUSTER_DOMAIN                cup.hp.com        NODE_NAME               node3        NODE_NAME               node4
### Section 2.  Recovery GroupsRECOVERY_GROUP_NAME               RG1        PRIMARY_PACKAGE                 ClusterB/pkgX'        RECOVERY_PACKAGE                ClustserA/pkgXRECOVERY_GROUP_NAME                RG2        PRIMARY_PACKAGE                  ClusterB/pkgY'        RECOVERY_PACKAGE               ClusterA/pkgY        DATA_RECEIVER_PACKAGE          ClusterA/pkgR1RECOVERY_GROUP_NAME                RG3        PRIMARY_PACKAGE                  ClusterB/pkgZ         RECOVERY_PACKAGE                ClustserA/pkgZ'
RECOVERY_GROUP_NAME                RG4        PRIMARY_PACKAGE                  ClusterB/pkgW        RECOVERY_PACKAGE               ClusterA/pkgW'        DATA_RECEIVER_PACKAGE          ClusterA/pkgR2
### Section 3.  Monitoring DefinitionsCLUSTER_EVENT   ClusterB/DOWN        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: DOWN”        CLUSTER_ALARM            0 MINUTES                NOTIFICATION      SYSLOG                “CC alarm: DOWN”CLUSTER_EVENT   ClusterB/UNREACHABLE        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: UNREACHABLE”        CLUSTER_ALARM            0 MINUTES                NOTIFICATION      SYSLOG                “CC alarm: UNREACHABLE”CLUSTER_EVENT   ClusterB/ERROR        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: ERROR”CLUSTER_EVENT   ClusterB/UP        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: UP”

Newly Created Cluster Will Run Primary Packages

After you create a new cluster to replace the damaged cluster, you may choose to restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages.

  1. Configure the new cluster as a Serviceguard cluster. Use the cmviewcl command on the surviving cluster and compare the results to the new cluster configuration. Correct any inconsistencies on the new cluster.

  2. Halt the monitor package on the surviving recovery cluster:

    # cmhaltpkg ccmonpkg

  3. Edit the continental cluster configuration file to replace the data from the old failed cluster with data from the new cluster. Check and apply the Continentalclusters configuration:

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. For each recovery group where the new cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving recovery cluster to the disks on the new cluster. This may be time-consuming.

    2. Halt the application on the surviving recovery cluster if necessary, and start it on the new cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. If the new cluster acts as recovery cluster for any recovery group, create a monitor package for the new cluster.

    Use the following command to apply the configuration of the new monitor pakcage:

    # cmapplyconf -p ccmonpkg.config

  6. Restart the monitor package on the surviving cluster:

    # cmrunpkg ccmonpkg

  7. View the status of the Continentalcluster.

    # cmviewconcl

Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups

After you replace the failed cluster, if you are concerned with the downtime involved in moving the applications back, you can change the surviving cluster to the role of primary cluster for all recovery groups, and configure the new cluster as a recovery cluster for all those groups.

You would configure the new cluster as a standard Serviceguard cluster, and follow the usual procedure to configure the continental cluster with the new cluster used as a recovery cluster for all recovery groups.

NOTE: In a multiple recovery pairs scenario, (where more than one primary cluster is configured to share the same recovery cluster), reconfiguration of the recovery cluster should not be done due to the failure of one of the primary clusters.
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.