Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 6 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA

Completing and Running a Continental Cluster Solution with Continuous Access EVA

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The following section describes how to configure a continental cluster solution using Continuous Access EVA, which requires the HP Metrocluster with Continuous Access EVA product.

NOTE: Make sure to have completed the preparation for the Metrocluster CA EVA as described in section, “Preparing a Serviceguard Cluster for Metrocluster CA EVA ” on both primary and recovery sites.

Setting up a Primary Package on the Primary Cluster

Use the procedures in this section to configure a primary package on the primary cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster.

  1. Install Continentalclusters on all the cluster nodes in the primary cluster (skip this step if the software has been pre installed).
    Run swinstall(1m) to install HP Continentalclusters from an SD depot.

  2. When swinstall(1m) has completed, create a directory for the new package in the primary cluster:
    # mkdir /etc/cmcluster/<package_name>

    Create a Serviceguard package configuration file in the primary cluster.
    # cd /etc/cmcluster/<package_name>
    # cmmakepkg -p <package_name>.ascii

    Customize the Serviceguard package configuration file as appropriate to your application. Be sure to include the pathname of the control script /etc/cmcluster/<package_name>/ <package_name>.cntl for the RUN_SCRIPT and HALT_SCRIPT parameters.

    Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts.

    Only after the primary packages start, use cmmodpkg to enable package switching on all primary packages. By enabling package switching in the package configuration, it will automatically start the primary package when the cluster starts. However, had there been a primary cluster disaster, resulting in the recovery package starting and running on the recovery cluster, the primary package should not be started until after first stopping the recovery package.

  3. Create a package control script.

    # cmmakepkg -s pkgname.cntl

    Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater

  4. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions.

  5. Copy the environment file template:
    /opt/cmcluster/toolkit/SGCAEVA/caeva.env to the package directory, naming it pkgname_caeva.env:

    # cp /opt/cmcluster/toolkit/SGCAEVA/caeva.env \
    /etc/cmcluster/pkgname/pkgname_caeva.env

    NOTE: If a package name is not used as a filename for the package control script, it is required to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen.

    Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env.
    Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env.
  6. Edit the environment file <pkgname>_caeva.env as follows:

    1. Set the CLUSTER_TYPE variable to CONTINENTAL

    2. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for a description of these variables.

    3. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred.

    4. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster.

    5. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created.

    6. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI.

    7. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names.

    8. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names.

    9. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI.

    10. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names.

    11. Set the DC2_HOST _LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names.

    12. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds.

  7. Distribute Metrocluster CA EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp.

    # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname

  8. Apply the Serviceguard configuration using the cmapplyconf command or SAM.

  9. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname

    pkgname.cntl

    Serviceguard package control script

    pkgname_caeva.env

    Metrocluster CA EVA environment file

    pkgname.ascii

    Serviceguard package ASCII configuration file

    pkgname.sh

    Package monitor shell script, if applicable

    other files

    Any other scripts used to manage Serviceguard packages

    The Serviceguard cluster is ready to automatically switch packages to nodes in remote data centers using Metrocluster CA EVA

  10. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover.

  11. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time.

Setting up a Recovery Package on the Recovery Cluster

Use the procedures in this section to configure a recovery package on the recovery cluster. Consult the Serviceguard documentation for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster. Use the following steps for the recovery package set up:

  1. Install Continentalclusters on all the cluster nodes in the recovery cluster (skip this step if the software has been pre installed).

    NOTE: Serviceguard should already be installed on all the cluster nodes. Run swinstall(1m) to install Continentalclusters from an SD depot.
  2. When swinstall(1m) has completed, create a directory as follows for the new package in the recovery cluster.

    # mkdir /etc/cmcluster/<package_name>

    Create an Serviceguard package configuration file in the recovery cluster.

    # cd /etc/cmcluster/<package_name>

    # cmmakepkg -p <package_name>.ascii
    Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster/<package_name>/ <package_name>.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Do not use cmmodpkg to enable package switching on any recovery package. Enabling package switching will automatically start the recovery package. Package switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package.

  3. Create a package control script.

    # cmmakepkg -s pkgname.cntl

    Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater.

    NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on the primary cluster. Some of the control script variables, such as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster. Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from those on the primary cluster. Make sure that you review all the variables accordingly.
  4. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See Managing Serviceguard for more information on these functions.

  5. Copy the environment file template /opt/cmcluster/toolkit/SGCA/xpca.env to the package directory, naming it pkgname_xpca.env:

    # cp /opt/cmcluster/toolkit/SGCAEVA/caeva.env \ /etc/cmcluster/pkgname/pkgname_caeva.env

  6. Edit the environment file <pkgname>_caeva.env as follows:

    1. Set the CLUSTER_TYPE variable to CONTINENTAL

    2. Set the PKGDIR variable to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/package_name, removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables.

    3. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred.

    4. Set the WAIT_TIME variable to the timeout, in minutes, to wait for completion of the data merge from source to destination volume before starting up the package on the destination volume. If the wait time expires and merging is still in progress, the package will fail to start with an error that prevents restarting on any node in the cluster.

    5. Set the DR_GROUP_NAME variable to the name of DR Group used by this package. This DR Group name is defined when the DR Group is created.

    6. Set the DC1_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 1. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI.

    7. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names.

    8. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names.

    9. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI.

    10. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names.

    11. Set the DC2_HOST _LIST variable to the list of clustered nodes which resides in Data Center 2. Multiple names are defined using a comma as a separator between the names.

    12. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds.

  7. Distribute Metrocluster CA EVA configuration, environment and control script files to other nodes in the cluster by using ftp or rcp:

    # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname

    See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already exist on all nodes.

    Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue.

  8. Apply the Serviceguard configuration using the cmapplyconf command or SAM.

  9. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname:

    bkpbkgname.cntl

    Serviceguard package control script

    bkpkgname_caeva.env

    Metrocluster CA EVA environment file

    bkpkgname.ascii

    Serviceguard package ASCII configuration file

    bkpkgname.sh

    Package monitor shell script, if applicable

    other files

    Any other scripts you use to manage Serviceguard packages

  10. Make sure the packages on the primary cluster are not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg) test the recovery cluster for cluster and package startup and package failover.

  11. Any running package on the recovery cluster that has a counterpart on the primary cluster should be halted at this time.

Setting up the Continental Cluster Configuration

The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 4 “Designing a Continental Cluster”

  1. Generate the Continentalclusters configuration using the following command:

    # cmqueryconcl -C cmconcl.config

  2. Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. When data replication is done using Continuous Access EVA, there are no data sender and receiver packages.

    Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step.

  3. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software.

  4. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed.

  5. Apply the monitor package to both cluster configurations.

    # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config

  6. Apply the continental cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/cmclconfig nor is there an equivalent file for Continentalclusters. Example:

    # cmapplyconcl -C cmconcl.config

  7. Start the monitor package on both clusters.

    NOTE: The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status.
  8. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file.

  9. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster.

  10. View the status of the Continentalcluster primary and recovery clusters, including configured event data.

    # cmviewconcl -v

The continental cluster is now ready for testing. See “Testing the Continental Cluster”.

Switching to the Recovery Cluster in Case of Disaster

It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following:

  • During an alert, the cmrecovercl will not start the recovery packages unless the -f option is used.

  • During an alarm, the cmrecovercl will start the recovery packages without the -f option.

  • When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up.

Failover to Recovery Site

After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster.

The recovery package control script will evaluate the status of the DR group used by the package, and will do the failover of the DR group to the EVA in the recovery site. This means after the failover was successful, the DR group in the recovery site's EVA will be source and accessible with read/write mode.

NOTE: If the CA links between the two EVAs are down, the recovery package will only start up if one of the following conditions are true:
  • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package’s environment file is set to “Availability_Preferred”.

  • The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the package's environment file is set to “ Data_Currency_Preferred”, and a FORCE_FLAG file exits in the package directory.

After the recovery package is up and running, the EVA in the recovery site will have more current data than the one in the primary site.

Failover Scenarios

The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following scenarios addresses some of those failures and suggests recovery approaches applicable to environments using data replication provided by HP StorageWorks EVA series disk arrays and Continuous Access (CA).

Scenario 1

The primary site has lost power for a prolonged time, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the EVA disk array or the operating systems of the systems at the primary site.

Failback to the Primary Site

In this scenario, the EVA in the primary site is down due to the loss of power; therefore, the storage configuration information and the application data prior to power failure remain intact in the EVA. When the primary site’s power is restored, the EVA is up and running, and CA links are up, CA EVA software will automatically resynchronize the data from the recovery site's EVA back to the primary site’s EVA. If the resynchronization is a full copy operation, the data in the primary site's EVA is not consistent and is not usable until the full copy (resynchronization) completes.

It is recommended to wait until the resynchronization is complete before failing back the packages to the primary site. The state of the DR group in the primary site’s EVA can be checked either via Command View (CV) EVA or SSSU command. If the state of each Vdisk in the DR group is shown “Normal”, then the resynchronization is complete, and the user can move the packages back to the primary site.

Scenario 2

The primary site HP StorageWorks EVA disk array experienced a catastrophic hardware failure and all data was lost on the array.

Failback to the Primary Site

In this scenario the disk array is repaired or a new EVA array is commissioned at the primary site. Before the application can fail back to the primary site, the EVA in the recovery site (now is the source storage) needs to establish the replication relationship with the new EVA in the primary site (now is the destination storage). Refer to the procedure named “Return Operations to Replaced New Storage Hardware” in the “Continuous Access EVA Operation Guide” to rebuild the DR groups configured in the EVA. Once the DR groups re-build and the destination storage is synchronized with the source storage, the packages can be failed back to the primary site.

Scenario 3

The primary site has lost power, which only impact the systems in the primary cluster. The primary cluster is down but the EVA disk array and CA links to the recovery site are up and running.

Failback in Scenario 3

In this scenario the EVA disk arrays in both sites are up and running. The CA links are functional. When the recovery packages are up and running on the recovery site, CA EVA automatically switches the replication direction; the new data written on the recovery site's EVA is replicated to the primary site's EVA.

After the primary cluster is back online, the packages can be failed back to the primary site.

Reconfiguring Recovery Group Site Identities in Continentalclusters after a Recovery

In a disaster scenario where the primary site goes out of operation, and there was no loss of data on the disk array or the servers. After the recovery is completed the recovered application can continue to run at the recovery site without requiring to fail back when the primary cluster becomes available at a later point in time.

This will avoid further downtime for the recovered application. But it will also be desired to have the same level of recovery capabilities for the applications in their new site, as they had in their original primary site.

As described in the above scenario, Continentalclusters can be reconfigured to provide monitoring and recovery for the application now running on its recovery cluster. This is done by switching the identities of the sites in the applications context. (that is, the old (or original) primary site will become the recovery site and the old (or original) recovery site will become the primary site. This type of reconfiguration for Continentalclusters is possible only in a two cluster and two site configuration.

Continentalclusters solutions using HP StorageWorks EVA disk arrays will need no disk array replication related tasks during the reconfigurations. Once the primary site EVA Disk array comes back online, the HP StorageWorks EVA CA will automatically resynchronize the data making the recovery site as “source” and the old primary site as “destination”.

Use the cmswitchconcl command (only in a two cluster configuration) to swap the site identities for all or a selected application’s recovery group. This is so that the applications can now be monitored and recovered from their once primary cluster.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.