Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 4 Designing a Continental Cluster

Restoring Disaster Tolerance

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are:

  • Restoring the failed cluster

    Depending on the nature of the disaster it may be necessary to either create a new cluster or to restore the cluster.

    Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled. This is to prevent starting the packages unexpectedly with the cluster.

  • Resynchronizing the data

    To resynchronize the data, you either restore the data to the cluster and continue with the same data replication procedure, or set up data replication to function in the other direction.

The following sections briefly outline some scenarios for restoring disaster tolerance.

Restore Clusters to their Original Roles

If the disaster did not destroy the cluster, there is the option to return both clusters in a recovery pair to their original roles. To do this:

  1. Make sure that both clusters are up and running, with the recovery packages continuing to run on the surviving cluster.

  2. On each cluster, stop the Continentalclusters monitor package if it is still running.

    # cmhaltpkg ccmonpkg

  3. Compare the clusters to make sure their configurations are consistent. Correct any inconsistencies.

  4. For each recovery group where the new cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving cluster to the disks on the new cluster. This may be time-consuming.

    2. Halt the recovered application on the surviving cluster if necessary, and start it on the new cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. Restart the monitor using the following command on each cluster:

    # cmrunpkg ccmonpkg

    Alternatively, if the monitoring package configuration has been modified, use the following sequence on each cluster to apply the new configuration and start the monitor:

    # cmapplyconf -P ccmonpkg.config

    # cmmodpkg -e ccmonpkg

  6. View the status of the Continentalcluster.

    # cmviewconcl

Primary Packages Remaining on the Surviving Cluster

Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster as a primary-only cluster. This minimizes the downtime involved with moving the applications back to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle running all critical applications indefinitely.

NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured to share the same recovery cluster, the following procedure to switch the role of the failed cluster and the surviving cluster should not be used.

Use the following:

  1. Halt the monitor packages. Issue the following command on each cluster:

    # cmhaltpkg ccmonpkg

  2. Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. It may also be necessary to re-create data sender and data receiver packages.

  3. Check and apply the Continentalclusters configuration.

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. Restart the monitor packages on each cluster.

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster.

    # cmmviewconcl

Before applying the edited configuration, the data storage associated with each cluster needs to be prepared to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

Primary Packages Remaining on the Surviving Cluster using cmswitchconcl

Continentalclusters provides the command cmswitchconcl to facilitate steps two and three described in the section “Primary Packages Remaining on the Surviving Cluster”. The command cmswitchconcl is used to switch the roles of primary and recovery packages of the Continentalclusters recovery groups for which the specified cluster is defined as the primary cluster. Do not use the cmswitchconcl command in a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster. Otherwise, the command will fail.

NOTE: Before running the cmswitchconcl command, the data storage associated with each cluster needs to be prepared properly to match the new role. In addition, the data replication direction needs to be changed to mirror data from the new primary cluster to the new recovery cluster.

The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified.

To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures:

  1. Halt the monitor package on each cluster.

    # cmhaltpkg ccmonpkg

  2. Run:

    # cmswitchconcl \

    -C currentContinentalclustersConfigFileName \

    -c oldPrimaryClusterName \

    [-a] [-F NewContinentalclustersConfigFileName]

    The above command switches the roles of the primary and recovery packages of the Continentalclusters recovery groups for which “OldPrimaryClusterName” is defined as the primary cluster.

    The default values of monitoring package name (ccmonpkg) and interval (60 seconds), and notification scheme (SYSLOG) with notification delay (0 seconds) will be added for cluster “OldPrimaryClusterName”, which will serve as the recover-only cluster.

    If editing of the default values are desired, do it with file “NewContinentalclusterConfigFileName” if -F is specified, or with file, “CurrentContinentalclustersConfigFileName” if -F is not specified. If editing of the new configuration file is needed, do not use the -a option. If option -a is specified the new configuration will be applied automatically.

  3. If option -a is specified with cmswitchconcl in step 2, skip this step. Otherwise manually apply the new Continentalclusters configuration:

    # cmapplyconcl -v -c newContinentalclustersConfigFileName (if -F is specified in step 2)

    # cmapplyconcl -v -c \ CurrentContinentalclusterConfigFileName (if -F is not specified in step 2)

  4. Restart the monitor packages on each cluster.

    # cmmodpkg -e ccmonpkg

  5. View the status of the Continentalcluster:

    # cmviewconcl

The cmswitchconcl command is also used to switch the package role of a recovery group. If only a subset of the primary packages will remain running on the surviving (recovery) cluster, a new option -g is provided with the cmswitchconcl command. This option reconfigures the roles of the packages of a recovery group and helps retain recovery protection after a failover.

Usage of option -g (recovery group based role switch reconfiguration) is the same as the one for -c (cluster based role switch reconfiguration). Note, option -c and -g of the cmswitchconcl command are mutually exclusive.

# cmswitchconcl \

-C currentContinentalclustersConfigFileName \

-g RecoverGroupName \

[-a] [-F NewContinentalclustersConfigFileName]

The following is a sample of input and output files for running cmswitchconcl -C sample.input -c clusterA -F Sample.out

sample.input
============
### Section 1. Cluster Information
CONTINENTAL_CLUSTER_NAME        Sample_CC_Cluster
CLUSTER_NAME ClusterA
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node1
NODE_NAME node2
MONITOR_PACKAGE_NAME ccmonpkg
CLUSTER_NAME ClusterB
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node3
NODE_NAME node4
MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 60 SECONDS
### Section 2.  Recovery Groups
RECOVERY_GROUP_NAME RG1
PRIMARY_PACKAGE ClusterA/pkgX
RECOVERY_PACKAGE ClusterB/pkgX'
RECOVERY_GROUP_NAME RG2
PRIMARY_PACKAGE ClusterA/pkgY
RECOVERY_PACKAGE ClusterB/pkgY'
DATA_RECEIVER_PACKAGE ClusterB/pkgR1
RECOVERY_GROUP_NAME RG3
PRIMARY_PACKAGE ClusterB/pkgZ
RECOVERY_PACKAGE ClusterA/pkgZ'
RECOVERY_GROUP_NAME                RG4
PRIMARY_PACKAGE ClusterB/pkgW
RECOVERY_PACKAGE ClusterA/pkgW'
DATA_RECEIVER_PACKAGE ClusterA/pkgR2
### Section 3.  Monitoring Definitions
CLUSTER_EVENT ClusterA/DOWN
MONITORING_CLUSTER ClusterB
CLUSTER_ALERT 60 SECONDS
NOTIFICATION      TEXTLOG /home/user/logs/events.log
“CC alert: DOWN”
NOTIFICATION SYSLOG
“CC alert: DOWN”
CLUSTER_ALARM 90 SECONDS
NOTIFICATION TEXTLOG /home/users/logs/events.log
“CC alarm: DOWN”
NOTIFICATION SYSLOG
“CC alarm: DOWN”
sample.output
### Section 1. Cluster Information
CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster
CLUSTER_NAME ClusterA
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node1
NODE_NAME node2
MONITOR_PACKAGE_NAME ccmonpkg
        MONITOR_INTERVAL 60 SECONDS
CLUSTER_NAME ClusterB
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node3
NODE_NAME node4
### Section 2.  Recovery Groups
RECOVERY_GROUP_NAME RG1
PRIMARY_PACKAGE ClusterB/pkgX'
RECOVERY_PACKAGE ClustserA/pkgX
RECOVERY_GROUP_NAME RG2
PRIMARY_PACKAGE ClusterB/pkgY'
RECOVERY_PACKAGE ClusterA/pkgY
DATA_RECEIVER_PACKAGE ClusterA/pkgR1
RECOVERY_GROUP_NAME RG3
PRIMARY_PACKAGE ClusterB/pkgZ
        RECOVERY_PACKAGE ClustserA/pkgZ'
RECOVERY_GROUP_NAME                RG4
PRIMARY_PACKAGE ClusterB/pkgW
RECOVERY_PACKAGE ClusterA/pkgW'
DATA_RECEIVER_PACKAGE ClusterA/pkgR2
### Section 3.  Monitoring Definitions
CLUSTER_EVENT ClusterB/DOWN
MONITORING_CLUSTER ClusterA
CLUSTER_ALERT 0 MINUTES
NOTIFICATION SYSLOG
“CC alert: DOWN”
CLUSTER_ALARM 0 MINUTES
NOTIFICATION SYSLOG
“CC alarm: DOWN”
CLUSTER_EVENT ClusterB/UNREACHABLE
MONITORING_CLUSTER ClusterA
CLUSTER_ALERT 0 MINUTES
NOTIFICATION SYSLOG
“CC alert: UNREACHABLE”
CLUSTER_ALARM 0 MINUTES
NOTIFICATION SYSLOG
“CC alarm: UNREACHABLE”
CLUSTER_EVENT ClusterB/ERROR
MONITORING_CLUSTER ClusterA
CLUSTER_ALERT 0 MINUTES
NOTIFICATION SYSLOG
“CC alert: ERROR”
CLUSTER_EVENT ClusterB/UP
MONITORING_CLUSTER ClusterA
CLUSTER_ALERT 0 MINUTES
NOTIFICATION SYSLOG
“CC alert: UP”

Newly Created Cluster Will Run Primary Packages

After creating a new cluster to replace the damaged cluster, restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages.

  1. Configure the new cluster as a Serviceguard cluster. Use the cmviewcl command on the surviving cluster and compare the results to the new cluster configuration. Correct any inconsistencies on the new cluster.

  2. Halt the monitor package on the surviving recovery cluster.

    # cmhaltpkg ccmonpkg

  3. Edit the continental cluster configuration file to replace the data from the old failed cluster with data from the new cluster. Check and apply the Continentalclusters configuration.

    # cmcheckconcl -v -C cmconcl.config

    # cmapplyconcl -v -C cmconcl.config

  4. Do the following for each recovery group where the new cluster will run the primary package:

    1. Synchronize the data from the disks on the surviving recovery cluster to the disks on the new cluster. This may be time-consuming.

    2. Halt the application on the surviving recovery cluster if necessary, and start it on the new cluster.

    3. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group.

  5. If the new cluster acts as a recovery cluster for any recovery group, create a monitor package for the new cluster.

    Apply the configuration of the new monitor package.

    # cmapplyconf -p ccmonpkg.config

  6. Restart the monitor package on the surviving cluster.

    # cmrunpkg ccmonpkg

  7. View the status of the Continentalcluster.

    # cmviewconcl

Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups

After replacing the failed cluster, if the downtime involved in moving the applications back is a concern, then do the following:

  • Change the surviving cluster to the role of primary cluster for all recovery groups.

  • Configure the new cluster as a recovery cluster for all those groups

Configure the new cluster as a standard Serviceguard cluster, and follow the usual procedure to configure the continental cluster with the new cluster used as a recovery cluster for all recovery groups.

NOTE: In a multiple recovery pairs scenario, (where more than one primary cluster is configured to share the same recovery cluster), reconfiguration of the recovery cluster should not be done due to the failure of one of the primary clusters.
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.