Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters: > Chapter 2 Designing a Continental Cluster

Restoring Disaster Tolerance

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are:

  • Restoring the failed cluster.

    Depending on the nature of the disaster it may be necessary to either create a new cluster or to restore the cluster.

    Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled. This is to prevent starting the packages unexpectedly with the cluster.

  • Resynchronizing the data

    To resynchronize the data, you either restore the data to the cluster and continue with the same data replication procedure, or set up data replication to function in the other direction.

The following sections briefly outline some scenarios for restoring disaster tolerance.

Restore Clusters to their Original Roles

If the disaster did not destroy the cluster, there is the option to return both clusters in a recovery pair to their original roles. To do this:

  1. Make sure that both clusters are up and running, with the recovery packages continuing to run on the surviving cluster.

  2. On each cluster, stop the ContinentalClusters monitor package if it is still running.

    # cmhaltpkg ccmonpkg

  3. Compare the clusters to make sure their configurations are consistent. Correct any inconsistencies.

  4. For each recovery group where the repaired cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving cluster to the disks on the repaired cluster. This may be time-consuming.

    2. Halt the recovered application on the surviving cluster if necessary, and start it on the repaired cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. Restart the monitor using the following command on each cluster:

    # cmrunpkg ccmonpkg

    Alternatively, if the monitoring package configuration has been modified, use the following sequence on each cluster to apply the new configuration and start the monitor:

    # cmapplyconf -P ccmonpkg.config

    # cmmodpkg -e ccmonpkg

  6. View the status of the Continentalcluster.

    # cmviewconcl

Primary Packages Remaining on the Surviving Cluster

Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster as a primary-only cluster. This minimizes the downtime involved with moving the applications back to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle running all critical applications indefinitely.

NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured to share the same recovery cluster, the following procedure to switch the role of the failed cluster and the surviving cluster should not be used.

Use the following:

  1. Halt the monitor packages. Issue the following command on each cluster:

    # cmhaltpkg ccmonpkg

  2. Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. It may also be necessary to re-create data sender and data receiver packages.

  3. Check and apply the Continentalclusters configuration.

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. Restart the monitor packages on each cluster.

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster.

    # cmmviewconcl

Before applying the edited configuration, the data storage associated with each cluster needs to be prepared to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

Primary Packages Remaining on the Surviving Cluster using cmswitchconcl

Continentalclusters provides the command cmswitchconcl to facilitate steps two and three described in the section “Primary Packages Remaining on the Surviving Cluster”. The command cmswitchconcl is used to switch the roles of primary and recovery packages of the Continentalclusters recovery groups for which the specified cluster is defined as the primary cluster. Do not use the cmswitchconcl command in a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster. Otherwise, the command will fail.

NOTE: Before running the cmswitchconcl command, the data storage associated with each cluster needs to be prepared properly to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified.

To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures:

  1. Halt the monitor package on each cluster.

    # cmhaltpkg ccmonpkg

  2. Run this command.

    # cmswitchconcl \

    -C currentContinentalclustersConfigFileName \

    -c oldPrimaryClusterName \

    [-a] [-F NewContinentalclustersConfigFileName]

    The above command switches the roles of the primary and recovery packages of the Continentalclusters recovery groups for which “OldPrimaryClusterName” is defined as the primary cluster.

    The default values of monitoring package name (ccmonpkg) and interval (60 seconds), and notification scheme (SYSLOG) with notification delay (0 seconds) will be added for cluster “OldPrimaryClusterName”, which will serve as the recover-only cluster.

    If editing of the default values are desired, do it with file “NewContinentalclusterConfigFileName” if -F is specified, or with file, “CurrentContinentalclustersConfigFileName” if -F is not specified. If editing of the new configuration file is needed, do not use the -a option. If option -a is specified the new configuration will be applied automatically.

  3. If option -a is specified with cmswitchconcl in step 2, skip this step. Otherwise manually apply the new Continentalclusters configuration.

    # cmapplyconcl -v -c newContinentalclustersConfigFileName (if -F is specified in step 2)

    # cmapplyconcl -v -c \ CurrentContinentalcusterConfigFileName (if -F is not specified in step 2)

  4. Restart the monitor packages on each cluster.

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster.

    # cmviewconcl

NOTE: The cluster shared storage configuration file /etc/cmconcl/ccrac/ccrac.config is not updated by cmswitchconcl. The CCRAC_CLUSTER and CCRAC_INSTANCE_PKGS variables in the cluster shared storage configuration file must be manually updated on all nodes in the clusters to reflect the new primary cluster and package names.

The cmswitchconcl command is also used to switch the package role of a recovery group. If only a subset of the primary packages will remain running on the surviving (recovery) cluster, a new option -g is provided with the cmswitchconcl command. This option reconfigures the roles of the packages of a recovery group and helps retain recovery protection after a failover.

Usage of option -g (recovery group based role switch reconfiguration) is the same as the one for -c (cluster based role switch reconfiguration). Note, option -c and -g of the cmswitchconcl command are mutually exclusive.

# cmswitchconcl \

-C currentContinentalclustersConfigFileName \

-g RecoverGroupName \

[-a] [-F NewContinentalclustersConfigFileName]

The following is a sample of input and output files for running cmswitchconcl -C sample.input -c clusterA -F Sample.out

sample.input
============
### Section 1. Cluster Information
CONTINENTAL_CLUSTER_NAME        Sample_CC_ClusterCLUSTER_NAME                    ClusterA        CLUSTER_DOMAIN        			cup.hp.com
NODE_NAME               node1        NODE_NAME               node2        MONITOR_PACKAGE_NAME    ccmonpkgCLUSTER_NAME                    ClusterBCLUSTER_DOMAIN                cup.hp.com        NODE_NAME               node3        NODE_NAME               node4        MONITOR_PACKAGE_NAME    ccmonpkgMONITOR_INTERVAL        60 SECONDS
### Section 2. Recovery Groups
RECOVERY_GROUP_NAME               RG1        PRIMARY_PACKAGE                   ClusterA/pkgX        RECOVERY_PACKAGE                  ClusterB/pkgX'RECOVERY_GROUP_NAME                RG2        PRIMARY_PACKAGE                ClusterA/pkgY        RECOVERY_PACKAGE               ClusterB/pkgY'        DATA_RECEIVER_PACKAGE          ClusterB/pkgR1RECOVERY_GROUP_NAME                RG3        PRIMARY_PACKAGE                ClusterB/pkgZ        RECOVERY_PACKAGE               ClusterA/pkgZ'
RECOVERY_GROUP_NAME                RG4
 PRIMARY_PACKAGE                   ClusterB/pkgW        RECOVERY_PACKAGE               ClusterA/pkgW'        DATA_RECEIVER_PACKAGE          ClusterA/pkgR2
### Section 3.  Monitoring Definitions
CLUSTER_EVENT   ClusterA/DOWN        MONITORING_CLUSTER        ClusterB        CLUSTER_ALERT             60 SECONDS
NOTIFICATION      TEXTLOG /var/opt/resmon/log/data/events.log                “CC alert: DOWN”                NOTIFICATION      SYSLOG                “CC alert: DOWN”        CLUSTER_ALARM             90 SECONDSNOTIFICATION      TEXTLOG /var/opt/resmon/log/data/events.log                “CC alarm: DOWN”                NOTIFICATION      SYSLOG                “CC alarm: DOWN”
sample.output### Section 1.  Cluster Information                CONTINENTAL_CLUSTER_NAME        Sample_CC_ClusterCLUSTER_NAME                    ClusterA        CLUSTER_DOMAIN            cup.hp.com        NODE_NAME               node1        NODE_NAME               node2        MONITOR_PACKAGE_NAME    ccmonpkg        MONITOR_INTERVAL        60 SECONDSCLUSTER_NAME                    ClusterBCLUSTER_DOMAIN                      cup.hp.com        NODE_NAME               node3        NODE_NAME               node4
### Section 2. Recovery Groups
RECOVERY_GROUP_NAME               RG1        PRIMARY_PACKAGE                 ClusterB/pkgX'        RECOVERY_PACKAGE                ClusterA/pkgXRECOVERY_GROUP_NAME                RG2        PRIMARY_PACKAGE                    ClusterB/pkgY'        RECOVERY_PACKAGE                   ClusterA/pkgY        DATA_RECEIVER_PACKAGE             ClusterA/pkgR1RECOVERY_GROUP_NAME                RG3        PRIMARY_PACKAGE                    ClusterB/pkgZ        RECOVERY_PACKAGE                   ClusterA/pkgZ'
RECOVERY_GROUP_NAME                RG4        PRIMARY_PACKAGE                   ClusterB/pkgW        RECOVERY_PACKAGE               ClusterA/pkgW'        DATA_RECEIVER_PACKAGE          ClusterA/pkgR2
### Section 3.  Monitoring DefinitionsCLUSTER_EVENT   ClusterB/DOWN        MONITORING_CLUSTER         ClusterA        CLUSTER_ALERT              0 MINUTES                NOTIFICATION       SYSLOG                “CC alert: DOWN”        CLUSTER_ALARM              0 MINUTES                NOTIFICATION       SYSLOG                “CC alarm: DOWN”CLUSTER_EVENT   ClusterB/UNREACHABLE        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: UNREACHABLE”        CLUSTER_ALARM             0 MINUTES                NOTIFICATION      SYSLOG                “CC alarm: UNREACHABLE”CLUSTER_EVENT   ClusterB/ERROR        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: ERROR”CLUSTER_EVENT   ClusterB/UP        MONITORING_CLUSTER        ClusterA        CLUSTER_ALERT             0 MINUTES                NOTIFICATION      SYSLOG                “CC alert: UP”

Newly Created Cluster Will Run Primary Packages

After creating a new cluster to replace the damaged cluster, restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages.

  1. Configure the new cluster as a Serviceguard cluster. Use the cmviewcl command on the surviving cluster and compare the results to the new cluster configuration. Correct any inconsistencies on the new cluster.

  2. Halt the monitor package on the surviving recovery cluster.

    # cmhaltpkg ccmonpkg

  3. Edit the continental cluster configuration file to replace the data from the old failed cluster with data from the new cluster. Check and apply the Continentalclusters configuration.

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. Do the following for each recovery group where the new cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving recovery cluster to the disks on the new cluster. This may be time-consuming.

    2. Halt the application on the surviving recovery cluster if necessary, and start it on the new cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. If the new cluster acts as a recovery cluster for any recovery group, create a monitor package for the new cluster.

    Apply the configuration of the new monitor package.

    # cmapplyconf -p ccmonpkg.config

  6. Restart the monitor package on the surviving cluster.

    # cmrunpkg ccmonpkg

  7. View the status of the Continentalcluster.

    # cmviewconcl

Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups

After replacing the failed cluster, if the downtime involved in moving the applications back is a concern, then do the following:

  • Change the surviving cluster to the role of primary cluster for all recovery groups.

  • Configure the new cluster as a recovery cluster for all those groups

Configure the new cluster as a standard Serviceguard cluster, and follow the usual procedure to configure the continental cluster with the new cluster used as a recovery cluster for all recovery groups.

NOTE: In a multiple recovery pairs scenario, (where more than one primary cluster is configured to share the same recovery cluster), reconfiguration of the recovery cluster should not be done due to the failure of one of the primary clusters.
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.