Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters: > Chapter 5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF

Building a Continental Cluster Solution with EMC SRDF

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The following section describes how to configure a continental cluster solution using EMC SRDF, which requires the Metrocluster with EMC SRDF product.

Setting up a Primary Package on the Primary Cluster

Use the procedures in this section to configure a primary package on the primary cluster. Consult the Managing Serviceguard user’s guide for more detailed instructions on setting up Serviceguard with packages, and for instructions on how to start, halt, and move packages and their services between nodes in a cluster.

  1. If this was not done previously, split the EMC SRDF logical links for the disks associated with the application package. See the script, Samples/pre.cmquery (edit to the SRDF groups configured) for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

  2. Create and test a standard Serviceguard cluster using the procedures described in the Managing Serviceguard user’s guide.

  3. Install Continentalclusters on all the cluster nodes in the primary cluster (Skip this step if the software has been pre installed)

    NOTE: Serviceguard should already be installed on all the cluster nodes.

    Run swinstall(1m)to install Continentalclusters and Metrocluster with EMC SRDF products from an SD depot.

  4. When swinstall(1m) has completed, create a directory as follows for the new package in the primary cluster.

    # mkdir /etc/cmcluster/<pkg_name>

  5. Copy the environment file template /opt/cmcluster/toolkit/SRDF/srdf.env to the package directory, naming it pkgname_srdf.env:

    # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \

    /etc/cmcluster/pkgname/pkgname_srdf.env

  6. Create an Serviceguard Application package configuration file.

    # cd /etc/cmcluster/<pkg_name>

    # cmmakepkg -p <pkg_name>.conf

    Customize it as appropriate to your application. Be sure to include Node names, the pathname of the control script (/etc/cmcluster/<pkg_name>/<pkg_name>.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters.

    Also change AUTO_RUN (PKG_SWITCHING_ENABLED in Serviceguard A.11.09) to NO. This will ensure that the application packages will not start automatically. (the ccmonpkg will be set to yes) Define the service (as required)

  7. Create a package control script.

    # cmmakepkg -s pkgname.cntl

    Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters. Be sure to set LV_UMOUNT_COUNT to 1 or greater.

  8. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions.

  9. Edit the environment file <pkg_name>_srdf.env as follows:

    1. Add the path where the EMC Solutions Enabler software binaries have been installed to the PATH environment variable. The default location is /usr/symcli/bin.

    2. Uncomment AUTO* environment variables. It is recommended to retain the default values of these variables unless there is a specific business requirement to change them. See Appendix B for an explanation of these variables.

    3. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/<pkg_name>.

    4. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the ’symdg list’ command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration.

    5. Uncomment the RETRY and RETRYTIME variables. The defaults should be used for the first package. The values should be slightly different for other packages. RETRYTIME should increase by two seconds for each package. The product of RETRY * RETRYTIME should be approximately five minutes. These variables are used to decide how often and how many times to retry the Symmetrix status commands.

      For example, if there are three packages with data on a particular Symmetrix pair (connected by SRDF), then the values for RETRY and RETRYTIME might be as follows:

      Table 5-3 RETRY and RETRYTIME Values

       RETRYTIMERETRY
      pkgA605
      pkgB437
      pkgC339

       

    6. Uncomment the CLUSTER_TYPE variable and set it to “continental”.

    7. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application.

  10. Edit the remaining control script variables (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application as it runs on the primary cluster. See the Managing Serviceguard manual for more information on these variables.

  11. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Serviceguard manual for more information on these functions.

  12. Distribute EMC SRDF package configuration, environment, and control script files to other nodes in the primary cluster by using ftp or rcp.

    # rcp -p /etc/cmcluster/<pkg_name>/<pkg_name>.cntl \

    other_node:/etc/cmcluster/<pkg_name>/<pkg_name>.cntl

    When using ftp, be sure to make the file executable on any destination systems.

  13. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/<pkg_name>:

    • <pkg_name>.cntl (EMC SRDF package control script)

    • <pkg_name>.conf (Serviceguard package ASCII config file)

    • <pkg_name>.sh (Package monitor shell script, if applicable)

    • <pkg_name>_srdf.env (Metrocluster EMC SRDF environment file)

  14. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

  15. Apply the Serviceguard configuration using the cmapplyconf command or SAM.

  16. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and failover.

  17. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply (after recovery cluster is completed in next section) for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

The primary cluster is now ready for the Continentalclusters operation.

Setting up a Recovery Package on the Recovery Cluster

The installation of EMC SRDF, Serviceguard, and Continentalclusters software is exactly the same as in the previous section.

The procedures below will install and configure a recovery package on the recovery cluster. Consult the Managing Serviceguard user’s guide for instructions on setting up a Serviceguard cluster (that is, LAN, VG, LV,...etc).

  1. Split the EMC SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be edited to refer to the SRDF groups configured and customized with the Symmetrix device group names.

  2. Generate a cluster ASCII file.

    # cmquerycl -n node1 -n node2 -C CClusterNY.ascii

    Edit the file CClusterNY.ascii. Be sure to select a primary cluster lock disk that is not a lock disk on the recovery cluster. Edits include spreading HEARTBEAT_IP on all user LANs, and setting MAX_PACKAGES.

  3. Check the configuration.

    # cmcheckconf -C CClusterNY.ascii

  4. Create the cluster binary.

    # cmapplyconf -C CClusterNY.ascii

  5. Test the cluster.

    # cmruncl -v

    # cmviewcl -v

    Does the cluster come up? If so, then stop the cluster:

    # cmhaltcl -f

  6. Copy the package files from the primary cluster to a bkpkgXXX directory, and rename it to <backup_pkg_name>.cntl and <backup_pkg_name>_srdf.env. Edit the recovery package control file from the primary cluster for the secondary cluster. Change the subnet, relocatable IP, and nodes.

    Be sure to set AUTO_RUN to NO in the package ASCII file.

  7. Edit the recovery package environment file <bk_pkg_name>_srdf.env as follows:

    1. Add the path for EMC Solutions Enabler software binaries.

    2. Make sure that all AUTO* variables are uncommented.

    3. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/<backup_pkgname>.

    4. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group names given in the symdg list command. The DEVICE_GROUP variable may also contain the consistency group name if using a M by N configuration.

    5. Uncomment the RETRY and RETRYTIME variables.

    6. Make sure the CLUSTER_TYPE variable is set to “continental”.

    7. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application.

  8. Edit the remaining application package control script variables in the package control script (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these variables. Change the Subnet IP from ftp copy.

  9. Verify that each host in both clusters in the continental cluster has the following files in the directory /etc/cmcluster/<pkg_name>:

    <backup_pkg_name>.cntl (continental cluster package control script)

    <backup_pkg_name>.conf (Serviceguard package ASCII config file)

    <backup_pkg_name>.sh (Package monitor shell script, if applicable)

    <backup_pkg_name>_srdf.env (Metrocluster SRDF environment file)

  10. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

  11. Apply the Serviceguard configuration using the cmapplyconf command or SAM for the recovery cluster.

  12. Test the cluster and packages.

    # cmruncl

    # cmmodpkg -e bkpkgCCA

    # cmviewcl -v

    Note that cmmodpkg is used to manually start the application packages.

    Do all application packages start? If so, then issue the following command.

    # cmhaltcl -f

    NOTE: Application packages cannot run on R1 and R2 at the same time. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted to prevent data corruption.
  13. Restore the SRDF logical links for the disks associated with the application package. See the script Samples/post.cmapply for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

The recovery cluster is now ready for continental cluster operation.

Setting up the Continental Cluster Configuration

The procedures below will configure Continentalclusters and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2 “Designing a Continental Cluster”

  1. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names.

  2. Generate the Continentalclusters configuration using the following command:

    # cmqueryconcl -C cmconcl.config

  3. Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each cluster, the recovery groups and the monitoring definitions. The recovery groups define the primary and recovery packages. Note that when data replication is done using EMC SRDF, there are no data sender and receiver packages.

    Define the monitoring parameters, the notification mechanism (ITO, email, console, SNMP, syslog or tcp) and notification type (alert or alarm) based on the cluster status (unknown, down, up or error). Descriptions for these can be found in the configuration file generated in the previous step.

  4. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software.

  5. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/scripts to /etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the application packages. The monitor package should start automatically when the cluster is formed.

  6. Apply the monitor package to both cluster configurations.

    # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config

  7. Restore the logical SRDF links for the package. See the script Samples/post.cmapply for an example of how to automate this task. The script must be customized with the appropriate Symmetrix device group names. Example:

    # Samples/post.cmapply

  8. Generate the cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/cmclconfig nor is there an equivalent file for Continentalclusters. Example:

    # cmapplyconcl -C cmconcl.config

  9. Start the monitor package on both clusters.

    The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status.

  10. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file.

  11. Start the primary packages on the primary cluster using cmrunpkg. Test local failover within the primary cluster.

  12. View the status of the continental cluster primary and recovery clusters, including configured event data.

    # cmviewconcl -v

The continental cluster is now ready for testing. See Chapter 2 “Designing a Continental Cluster” section “Testing the Continental Cluster”.

Switching to the Recovery Cluster in Case of Disaster

It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms.

After validating a failure, start the recovery process using the cmrecovercl [-f] command. Note the following:

  • During an alert, the cmrecovercl will not start the recovery packages unless the -f option is used.

  • During an alarm, the cmrecovercl will start the recovery packages without the -f option.

  • When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the recovery cluster. This condition applies not only when no alert or alarm was issued, but also applies to the situation where there was an alert or alarm, but the primary cluster recovered and its current status is Up.

    Verify SRDF links are Up.

    # symrdf list

Failback Scenarios

There is no failback counterpart to the “pushbutton” failover from the primary cluster to the recovery cluster. Failback is dependent on the original nature of the failover, the state of primary and secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the primary cluster. In Chapter 2 “Designing a Continental Cluster” there is a discussion of failback mechanisms and methodologies in the section “Restoring Disaster Tolerance”.

The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites. The following discussion addresses some of those failures and suggests recovery approaches applicable to the environments using data replication provided by Symmetrix Disk Arrays and Symmetrix Remote Data Facility SRDF.

Scenario 1

The primary site has lost power, including backup power (UPS), to both the systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the Symmetrix or the operating systems of the systems at the primary site. After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the R1 and R2 paired group volumes. The command symrdf list will display status of the device group.

 Source (R1) View                 Target (R2) View     MODES
-------------------------------- ------------------------ ----- ------------
ST LI ST
Standard A N A
Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair
Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE
-------------------------------- -- ------------------------ ----- ------------

DEV001 009F WD 0 0 NR 00A5 RW 0 0 S.. Failed Over
DEV002 00A0 WD 0 0 NR 00A6 RW 0 0 S.. Failed Over

After power is restored to the primary site, the Symmetrix device groups may be in the status of Failed Over. The procedure to move the application packages back to the primary site are different depending on the status of the device groups.

The following procedure applies to the situation where the device groups have a status of “Failed Over”:

  1. Halt the Continentalclusters recovery packages at the recovery site.

    # cmhaltpkg <pkg_name>

    This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the device groups will remain “Synchronized” at the recovery site and “Failed Over” at the primary site.

  2. Halt the cluster, which also halts the monitor package ccmonpkg.

  3. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically.

  4. Manually start the Continentalclusters primary packages at the primary site.

    # cmrunpkg <pkg_name> or

    # cmmodpkg -e <pkg_name>

    The control script is programmed to handle this case. The control script will issue an SRDF failback command to move the device group back to the R1 side and to resynchronize the R1 from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will ensure that any reads on the R1 side will be current, by reading data through the SRDF link from the R2 side.

    NOTE: If the system administrator does not want synchronization performed from the remote (recovery) site, the device groups should be split and recreated manually.

  5. Ensure that the monitor packages at the primary and recovery sites are running.

  6. Verify device group is synchronized.

    # symrdf list

  7. Manually bring the package back if the package does not come up, and the device group status is “failed over.”

    # symrdf -g pkgCCB_r1 failback

    Execute an RDF ’Failback’ operation for device
    group ’pkgCCB_r1’ (y/[n]) ? y

    An RDF ’Failback’ operation execution is
    in progress for device group ’pkgCCB_r1’. Please wait...

    Write Disable device(s) on RA at target (R2)..............Done.
    Suspend RDF link(s).......................................Done.
    Merge device track tables between source and target.......Started.
    Device: 001 ............................................. Merged.
    Merge device track tables between source and target.......Done.
    Resume RDF link(s)........................................Done.
    Read/Write Enable device(s) on SA at source (R1)..........Done.

    The RDF ’Failback’ operation successfully executed for
    device group ’pkgCCB_r1’.

  8. During the resync; the status goes from failed over > invalid > SyncInProg. Example:

    ftsys1a# symrdf list

    Symmetrix ID: 000183500021

    Local Device View
    ------------------------------------------------------------------------
    STATUS M O D E S RDF S T A T E S
    Sym RDF --------- ------------ R1 Inv R2 Inv-----------------------
    Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair
    --- ---- ----- --------- ------------ ------ ------ --- -----------------

    000 000 R2:2 RW WD RW SYN DIS OFF 0 0 WD RW Synchronized
    001 001 R2:2 RW WD RW SYN DIS OFF 12 0 WD WD Invalid

    ftsys1a# symrdf list

    Symmetrix ID: 000183500021

    Local Device View
    ----------------------------------------------------------------------------
    STATUS M O D E S RDF S T A T E S
    Sym RDF --------- ------------ R1 Inv R2 Inv ----------------------
    Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair
    --- ---- ----- --------- ------------ ------ ------ --- ----------------

    000 000 R2:2 RW WD RW SYN DIS OFF 0 0 WD RW Synchronized
    001 001 R2:2 RW WD RW SYN DIS OFF 2 0 WD RW SyncInProg

  9. Halt the recovery cluster and restart it.

    # cmhaltcl -f (if the cluster is not already down)

    # cmruncl

  10. Verify the data for data consistency and currency.

Scenario 2

The primary site Symmetrix experienced a catastrophic hardware failure and all data was lost on the array. After the reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow prescribed processes and recovery procedures to start the protected applications on the recovery cluster. The Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the status of the Symmetrix SRDF paired volumes. Since the systems at the primary site are accessible, but the Symmetrix is not, the control file will evaluate the paired volumes with a local status of “failed over”. The control file script is programmed to handle this condition and will enable the volume groups, mount the logical volumes, assign floating IP addresses and start any processes as coded into the script. After the primary site Symmetrix is repaired and configured, use the following procedure to move the application package back to the primary site.

  1. Manually create the Symmetrix device groups and gatekeeper configurations device groups. Re-run the scripts mk3symgrps* and mk4gatekpr* which do the following:

    # date >ftsys1.group.list

    # symdg create -type RDF1 pkgCCA_r1

    # symld -g pkgCCA_r1 add pd /dev/rdsk/c7t0d0

    # symgate define pd /dev/rdsk/c7t15d0

    # symgate define pd /dev/rdsk/c7t15d1

    # symgate -g pkgCCA_r1 associate pd /dev/rdsk/c7t15d0

  2. Halt the Continentalclusters recovery packages at the recovery site.

    # cmhaltpkg <pkg_name>

    This will halt any applications, remove any floating IP addresses, unmount file systems and deactivate volume groups as programmed into the package control files. The status of the paired volumes will be SPLIT at both the recovery and primary sites.

  3. Halt the Cluster, which also halts the monitor package ccmonpkg.

  4. Start the cluster at the primary site. Assuming they have been properly configured the Continentalclusters primary packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SPLIT at both the primary and recovery sites, the EMC views the two halves as unmirrored.

  5. Issue the following command:

    # symrdf -g pkgCCB_r1 failback

    Since the most current data will be at the remote or recovery site, this command to synchronize from the remote site). Wait for the synchronization process to complete before progressing to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step.

  6. Manually start the Continentalclusters primary packages at the primary site using

    # cmrunpkg <PKG_NAME>

    The control script is programmed to handle this case. The control script recognizes the paired volume is synchronized and will proceed with the programmed package startup.

  7. Verify the device group is synchronized.

    # symrdf list

  8. Ensure that the monitor packages at the primary and recovery sites are running.

Maintaining the EMC SRDF Data Replication Environment

Normal Startup

The following is the normal Continentalclusters startup procedure. On the primary cluster:

  1. Start the primary cluster.

    # cmruncl -v

    The primary cluster comes up with ccmonpkg up. The application packages are down, and ccmonpkg is up.

  2. Manually start application packages on the primary cluster.

    # cmmodpkg -e <Application_pkgname>

  3. Confirm primary cluster status.

    # cmviewcl -v

    and

    # cmviewconcl -v

  4. Verify SRDF Links.

    # symrdf list

On the recovery cluster, do the following:

  1. Start the recovery cluster.

    # cmruncl -v

    The recovery cluster comes up with ccmonpkg up. The application packages (bkpkgX) stay down, and ccmonpkg is up.

  2. Do not manually start application packages on the recovery cluster; this will cause data corruption.

  3. Confirm recovery cluster status.

    # cmviewcl -v

    and

    # cmviewconcl -v

Normal Maintenance

There might be situations where a package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Continentalclusters with EMC SRDF data replication:

  1. Shut down the package with the appropriate command. Example:

    # cmhaltpkg <pkgname>

  2. Distribute the package configuration changes. Example:

    # cmapplyconf - P <pkgconfig> (Primary cluster)

    # cmapplyconf -P <bkpkgconfig> (Recovery cluster)

  3. Start up the package with the appropriate Serviceguard command. Example:

    # cmmodpkg -e <pkgname> (Primary cluster)

    CAUTION: Never enable package switching on both the primary package and the recovery package.
  4. Halt the monitor package.

    # cmhaltpkg ccmonpkg

  5. To apply the new continental cluster configuration.

    # cmapplyconcl -C <configfile>

  6. Restart the monitor package.

    # cmrunpkg ccmonpkg

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.