Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 5 Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP

Completing and Running a Metrocluster Solution with Continuous Access XP

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

No additional steps are required after cluster and package configuration to complete the setup of the metropolitan cluster. In normal operation, the metropolitan cluster with Continuous Access XP starts like any other cluster, and runs and halts packages in the same way as a standard cluster. However, startup time for packages may be considerably slower because of the need to check disk status on both disk arrays.

Maintaining a Cluster that uses Metrocluster/CA

While the cluster is running, manual changes of state for devices on the XP Series disk array can cause the package to halt due to unexpected conditions or can cause the package to not start up after a failover. In general, it is recommended that no manual changes of state be performed while the package and the cluster are running.

NOTE: Manual changes can be made when they are required to bring the device group into a “protected” state. For example, if a package starts up with data replication suspended, a user can perform a pairresync command to re-establish data replication while the package is still running.

Viewing the Progress of Copy Operations

While a copy is in progress between XP systems (that is, the volumes are in a COPY state), you can see the progress of the copy by viewing the % column in the output of the pairdisplay command:

# pairdisplay -g pkgB -fc -CLI

Group   PairVol L/R   Port# TID LU  Seq# LDEV# P/S Status Fence    %  P-LDEV# M
pkgB pkgD-disk0 L CL1-C 0 3 35422 463 P-VOL COPY NEVER    79    460  -
pkgB pkgD-disk0 R CL1-F 0 3 35663 3 S-VOL COPY NEVER     -      0  -

This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state.

Viewing Side File Size

If you are using asynchronous data replication, you can see the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster operation, shows the percentage of the side file that is full:

# pairdisplay -g pkgB -fc -CLI

Group   PairVol L/R   Port# TID LU  Seq# LDEV# P/S Status Fence    %  P-LDEV# M
pkgB pkgD-disk0 L CL1-C 0 3 35422 463 P-VOL PAIR ASYNC    35       3 -
pkgB pkgD-disk0 R CL1-F 0 3 35663 3 S-VOL PAIR ASYNC     0     463 -

This output shows that 35% of the side file is full.

When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state.

Normal Maintenance

There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster/CA:

  1. Stop the package with the appropriate Serviceguard command.

    # cmhaltpkg pkgname

  2. Split links for the package.

    # pairsplit -g <package device group name> -rw

  3. Distribute the Metrocluster with Continuous Access XP configuration changes.

    # cmapplyconf -P pkgname.config

  4. Start the package with the appropriate Serviceguard command:

    # cmmodpkg -e pkgname

Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators”.

Resynchronizing

After certain failures, data is no longer remotely protected. In order to restore disaster tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored.

Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection:

  • Failure of all CA links without restart of the application

  • Failure of all CA links with Fence Level “DATA” with restart of the application on a primary host

  • Failure of the entire secondary Data Center for a given application package

  • Failure of the secondary XP Series disk array for a given application package while the application is running on a primary host

Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure:

  • Failure of the entire primary data center for a given application package

  • Failure of all of the primary hosts for a given application package

  • Failure of the primary XP Series disk array for a given application package

  • Failure of all CA links with restart of the application on a secondary host

Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. You should particularly review these files after system, data center and/or application failures.

Full resynchronization must be manually initiated after repairing the following failures:

  • Failure of the secondary XP Series disk array for a given application package followed by application startup on a primary host

  • Failure of all CA links with Fence Level NEVER and ASYNC with restart of the application on a primary host

Using the pairresync Command

The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the CA link is fixed, you use the pairresync command in one of the following two ways depending on which site you are on:

  • pairresync -swapp—from the primary site.

  • pairresync -swaps—from the failover site.

These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.

NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the CA link failure has been fixed, the user only needs to halt the package on the recovery cluster and restart on the primary cluster. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback.

Failback

After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.

CA XP Device Group Monitor

In the Metrocluster/CA environment where the device group state is not actively monitored and the end user may not be aware when the application data is not remotely protected for an extended period of time. Under these circumstances, the CA XP device group monitor provides the capability to monitor the status of the CA XP device group that is used in a package. The CA XP device group monitor, based on a pre-configured environment variable, also provides the ability to perform automatic resynchonization of the CA XP device group upon link recovery.

NOTE: If the monitor is configured to automatically resynchronize the data from PVOL to SVOL upon link recovery, a Business Copy (BC) volume of the SVOL should be configured as another mirror.

In the case of a rolling disaster and the data in the SVOL becomes corrupt due to an incomplete resychronization, the data in the BC volume can be restored to the SVOL. This will result non-current, bust usable data in the BC volumes

The monitor, as a package service, periodically checks the status of the CA XP device group that is configured for the package, and sends notification to the user via email, syslog, and console if there is a change in the status of the package’s device group.

CA XP Device Group Monitor Operation Overview

The CA XP device group monitor runs as a package service. The user can configure the monitor's setting through the package's environment file. Once the package has started the CA XP device group monitor, the monitor will periodically check the status of the CA XP device group. If there is a change in the status or the monitor is configured to notify after an interval of no status change, the monitor will send a notification that states the reason for the notification, a timestamp, and the status of the CA XP device group.

Configuring the Monitor

Use the following steps to configure a monitor for a package’s device group:

  • Configure the monitor’s variables in the package environment file.

  • Configure the monitor as a service of the package.

Configure the Monitor’s Variables in the Package Environment File.

Edit the following variables of the monitor’s section in the environment file <pkgname>_xpca.env as follows:

NOTE: See Appendix A for an explanation of these variables.
  • Uncomment the MON_POLL_INTERVAL variable and set it to the desired value in minutes. If this variable is not set, it will default to a value of 10 minutes.

  • Uncomment the MON_NOTIFICATION_FREQUENCY variable and set it to the desired value. This value is used to control the frequency of notification message when the state of the device group remains the same after the first check of the device group's state. If the value is zero, the monitor will only send notification when the state of the device group has changed. If the variable is not set, the default will be 0.

  • If you want to receive notification messages over email, uncomment the MON_NOTIFICATION_EMAIL variable and set it to a fully qualified email address. Multiple email addresses can be configured using comma as separator between the addresses.

  • If you want notification messages to be logged in the syslog file, uncomment the MON_NOTIFICATION_SYSLOG variable and set it to 1.

  • If you want notification messages to be logged on the system's console, uncomment the MON_NOTIFICATION_CONSOLE variable and set it to 1.

  • If you want an automatic resynchronization upon link recovery, uncomment the AUTO_RESYNC variable and set it to either 0, 1 or 2.

    If AUTO_RESYNC is set to 0 (DEFAULT), the monitor will not try to do the resynchronization from PVOL to SVOL. This setting will only send notifications.

    If AUTO_RESYNC is set to 1, the monitor will split the remote BC if one is configured from the mirror group before trying to do the resynchronization from PVOL to SVOL.

    If AUTO_RESYNC is set to 2, the monitor will only do the resynchronization from PVOL to SVOL when it finds the MON_RESYNC file in the package directory on the node that the package is running. The monitor will not manage the remote BC prior to and after the resynchronization. This setting is used if the user wants to manage the BC themselves.

    To enable the CA resynchronization for AUTO_RESYNC=2, it is necessary to create a file using the HP-UX command touch. For example:

    # touch /etc/cmcluster/packageA/MON_RESYNC

    (where /etc/cmcluster/packageA is the package directory)

    After the monitor detects the MON_RESYNC file, it is automatically removed.

    The following is an example of the CA XP device group monitor definition section in the environment file (<packagename>_xpca.env>) where the monitor will perform the following:

  • poll every 15 minutes.

  • send a notification on every third polling, if the state of the device group remains the same.

  • send the notifications to sysadmin1@hp.com and sysadmin2@hp.com.

  • log notifications to system log file, syslog.

  • display notifications to system console.

  • perform automatic resynchronization with BC management when detecting the device group local state change to PVOL-PSUE or PVOL-PDUB.

MON_POLL_INTERVAL=15MON_NOTIFICATION_FREQUENCY=3MON_NOTIFICATION_EMAIL=sysadmin1@hp.com,sysadmin2@hp.comMON_NOTIFICATION_SYSLOG=1MON_NOTIFICATION_CONSOLE=1AUTO_RESYNC=1

Configure CA XP Device Group Monitor as a Service of the Package

Add the monitor as a service in the package's configuration file and control script file as follows:

  • In the package's configuration file, add the following lines:

SERVICE_NAME pkgXdevgrpmon.srv
SERVICE_FAIL_FAST_ENABLED NOSERVICE_HALT_TIMEOUT 5
NOTE: The SERVICE_HALT_TIMEOUT value of 5 is a recommended value. If the value is set to lower than 5 seconds as the service halt timeout, then it may not allow enough time for the monitor to properly clean itself up.
  • In the package's control script file, add the following lines on the SERVICE NAMES AND COMMANDS section:

SERVICE_NAME[0]=”pkgXdevgrpmon.srv”SERVICE_CMD[0]=”/usr/sbin/DRMonitorXPCADevGrp <full path name of the package environment file>”SERVICE_RESTART[0]=”-r 10”
WARNING! If the CA links are still down while the monitor is trying to do the resynchronization and another failure occurs that causes a remote failover to the secondary site, the SVOL’s BC volumes will remain split from its mirror group.

This will only occur if the monitor is configured to perform automatic resynchronization using AUTO_RESYNC=1.

Troubleshooting the CA XP Device Group Monitor

The following is a guideline to help the user identify the cause of possible problems with the CA XP device group monitor.

Problems with email notifications

CA XP device group monitor uses SMTP to send out email notifications. All email notification problems are logged in the package log file.

If a warning message in the package log file indicates the monitor is unable to determine the SMTP port. it is caused by not having the SMTP port defined in the /etc/services file. The monitor assumes that SMTP port is 25. If a different port number is defined, the monitor will need to be restarted in order for it to connect to the correct port.

If an error message in the package control log file states that the SMTP server cannot be found is caused by not having a mail server configured on the local node, such as sendmail. A mail server needs to be configured and run in the local node for email notification. Once the mail server is running in the local node, the monitor will start sending email notifications.

Problems with Unknown CA Device Status

CA XP device group monitor relies on the Raid Manager instance to get the CA device group state. Under circumstances where the local Raid Manager instance fails, the monitor will not be able to determine the status of the CA device group state. The monitor will send out a notification to all configured destinations (i.e. email) stating that the state has changed to an UNKNOWN status. Since the monitor will not try to restart the Raid Manager instance, the user is required to restart the Raid Manager instance before the monitor will be able to determine the status of the CA device group. Make sure to start Raid Manager instance with the same instance number that is defined in the package’s environment file.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.