| United States-English |
|
|
|
![]() |
Designing Disaster Tolerant High Availability Clusters: > Chapter 5 Building
Disaster-Tolerant Serviceguard Solutions Using Metrocluster with
Continuous Access XPCompleting and Running a Metrocluster Solution with Continuous Access XP |
|
No additional steps are required after cluster and package configuration to complete the setup of the metropolitan cluster. In normal operation, the metropolitan cluster with Continuous Access XP starts like any other cluster, and runs and halts packages in the same way as a standard cluster. However, startup time for packages may be considerably slower because of the need to check disk status on both disk arrays. While the cluster is running, manual changes of state for devices on the XP Series disk array can cause the package to halt due to unexpected conditions or can cause the package to not start up after a failover. In general, it is recommended that no manual changes of state be performed while the package and the cluster are running.
While a copy is in progress between XP systems (that is, the volumes are in a COPY state), you can see the progress of the copy by viewing the % column in the output of the pairdisplay command: # pairdisplay -g pkgB -fc -CLI
This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state. If you are using asynchronous data replication, you can see
the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster
operation, shows the percentage of the side file that is full:
This output shows that 35% of the side file is full. When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state. There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the Metrocluster/CA:
Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators”. After certain failures, data is no longer remotely protected. In order to restore disaster tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection:
Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure:
Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. You should particularly review these files after system, data center and/or application failures. Full resynchronization must be manually initiated after repairing the following failures:
The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the CA link is fixed, you use the pairresync command in one of the following two ways depending on which site you are on:
These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.
In the Metrocluster/CA environment where the device group state is not actively monitored and the end user may not be aware when the application data is not remotely protected for an extended period of time. Under these circumstances, the XP/CA device group monitor provides the capability to monitor the status of the XP/CA device group that is used in a package. The XP/CA device group monitor, based on a pre-configured environment variable, also provides the ability to perform automatic resynchonization of the XP/CA device group upon link recovery.
The monitor, as a package service, periodically checks the status of the XP/CA device group that is configured for the package, and sends notification to the user via email, syslog, and console if there is a change in the status of the package’s device group. The XP/CA device group monitor runs as a package service. The user can configure the monitor's setting through the package's environment file. Once the package has started the XP/CA device group monitor, the monitor will periodically check the status of the XP/CA device group. If there is a change in the status or the monitor is configured to notify after an interval of no status change, the monitor will send a notification that states the reason for the notification, a timestamp, and the status of the XP/CA device group. Use the following steps to configure a monitor for a package’s device group:
Edit the following variables of the monitor’s section in the environment file <pkgname>_xpca.env as follows:
Add the monitor as a service in the package's configuration file and control script file as follows:
The following is a guideline to help the user identify the cause of possible problems with the XP/CA device group monitor. Problems with email notifications XP/CA device group monitor uses SMTP to send out email notifications. All email notification problems are logged in the package log file. If a warning message in the package log file indicates the monitor is unable to determine the SMTP port. it is caused by not having the SMTP port defined in the /etc/services file. The monitor assumes that SMTP port is 25. If a different port number is defined, the monitor will need to be restarted in order for it to connect to the correct port. If an error message in the package control log file states that the SMTP server cannot be found is caused by not having a mail server configured on the local node, such as sendmail. A mail server needs to be configured and run in the local node for email notification. Once the mail server is running in the local node, the monitor will start sending email notifications. Problems with Unknown CA Device Status XP/CA device group monitor relies on the Raid Manager instance to get the CA device group state. Under circumstances where the local Raid Manager instance fails, the monitor will not be able to determine the status of the CA device group state. The monitor will send out a notification to all configured destinations (i.e. email) stating that the state has changed to an UNKNOWN status. Since the monitor will not try to restart the Raid Manager instance, the user is required to restart the Raid Manager instance before the monitor will be able to determine the status of the CA device group. Make sure to start Raid Manager instance with the same instance number that is defined in the package’s environment file. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||