 |
» |
|
|
 |
Configuring
Packages for Automatic Disaster Recovery |  |
After completing the following steps, packages will be able
to automatically fail over to an alternate node in another data
center and still have access to the data that they need in order
to operate. This procedure must be repeated on all the cluster nodes for
each Serviceguard package so the application can fail over to any
of the nodes in the cluster. Customizations include editing an environment
file to set environment variables, and customizing the package control
script to include customer-defined run and halt commands, as appropriate.
The package control script must also be customized for the particular application
software that it will control. Consult the Managing Serviceguard user’s
guide for more detailed instructions on how to start, halt, and
move packages and their services between nodes in a cluster. For
ease of troubleshooting, configure and test one package at a time. Create a directory /etc/cmcluster/pkgname for each package: # mkdir /etc/cmcluster/pkgname Create a package configuration file. # cd /etc/cmcluster/pkgname # cmmakepkg -p pkgname.config Customize the package configuration file as appropriate to
your application. Be sure to include the pathname of the control
script (/etc/cmcluster/pkgname/pkgname.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. In the <pkgname>.config file, list the node names in the order in which you
want the package to fail over. It is recommended for performance
reasons, that you have the package fail over locally first, then
to the remote data center. Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration
the extra startup time required to obtain status from the EVA.  |  |  |  |  | NOTE: If using the EMS disk monitor as a package resource,
do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there
is no access from the host to the package disks. |  |  |  |  |
This toolkit may increase package startup time by 5 minutes
or more. Packages with many disk devices will take longer to start
up than those with fewer devices due to the time needed to get device status
from the EVA. Clusters with multiple packages that use devices on
the EVA will all cause package startup time to increase when more
than one package is starting at the same time. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application
using the guidelines in the Managing Serviceguard user’s
guide. Standard Serviceguard package customizations include modifying
the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters.
Be sure to set FS_UMOUNT_COUNT to 1. Add customer-defined run and halt commands in the
appropriate places according to the needs of the application. Refer
to the Managing Serviceguard user’s
guide for more detailed information on these functions. Copy the environment file template /opt/cmcluster/toolkit/SGCAEVA/caeva.env to the package directory, naming it pkgname_caeva.env: # cp /opt/cmcluster/toolkit/SGCAEVA/caeva.env \ /etc/cmcluster/pkgdir/pkgname_caeva.env  |  |  |  |  | NOTE: If not using a package name as a filename
for the package control script, it is necessary to follow the convention
of the environment file name. This is the combination of the file
name of the package control script without the file extension, an
underscore and type of the data replication technology (caeva) used.
The extension of the file must be env. The following examples demonstrate how the environment
file name should be chosen. Example 1: If the
file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of
the control script is control_script.sh, the environment file name would be control_script_caeva.env. |  |  |  |  |
Edit the environment file <pkgname>_caeva.env as follows: Set the CLUSTER_TYPE variable to METRO if this a Metrocluster. Set the PKGDIR variable
to the full path name of the directory where the control script
has been placed. This directory, which is used for status data files,
must be unique for each package. For example, set PKGDIR to /etc/cmcluster/package_name, removing any quotes around the file names. The operator
may create the FORCEFLAG file in this directory. See Appendix B
for an explanation of these variables. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. Set the WAIT_TIME variable to the timeout, in minutes, to wait for
completion of the data merge from source to destination volume before
starting up the package on the destination volume. If the wait time
expires and merging is still in progress, the package will fail
to start with an error that prevents restarting on any node in the
cluster. Set the DR_GROUP_NAME variable
to the name of DR Group used by this package. This DR Group name
is defined when the DR Group is created. Set the DC1_STORAGE_WORLD_WIDE_NAME variable
to the world wide name of the EVA storage system which resides in
Data Center 1. This WWN can be found on the front panel of the EVA controller,
or from command view EVA UI. Set the DC1_SMIS_LIST variable
to the list of Management Servers which resides in Data Center 1.
Multiple names are defined using a comma as a separator between
the names. Set the DC1_HOST_LIST variable
to the list of clustered nodes which resides in Data Center 1. Multiple
names are defined using a comma as a separator between the names. Set the DC2_STORAGE_WORLD_WIDE_NAME variable
to the world wide name of the EVA storage system which resides in
Data Center 2. This WWN can be found on the front panel of the EVA controller,
or from command view EVA UI. Set the DC2_SMIS_LIST variable
to the list of Management Server, which resides in Data Center 2.
Multiple names are defined using a comma as a separator between
the names. Set the DC2_HOST _LIST
variable to the list of clustered nodes which resides in Data Center
2. Multiple names are defined using a comma as a separator between
the names. Set the QUERY_TIME_OUT variable to the number of seconds to wait for
a response from the SMI-S CIMOM in Management Server. The default
timeout is 300 seconds. The recommended minimum value is 20 seconds.
After customizing the control script file and creating
the environment file, and before starting up the package, do a syntax check
on the control script using the following command (be sure to include
the -n option to perform syntax checking only): # sh -n <pkgname.cntl> If any messages are returned, it is recommended to correct
the syntax errors. Distribute
Metrocluster Continuous Access EVA configuration, environment and
control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already
exist on all nodes. Using ftp may be preferable at your organization, since it does
not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. Verify that each node in the Serviceguard cluster
has the following files in the directory /etc/cmcluster/pkgname: - pkgname.cntl
Seviceguard package control
script - pkgname_caeva.env
Metrocluster Continuous Access
EVA environment file - pkgname.config
Serviceguard package ASCII
configuration file - pkgname.sh
Package monitor shell script,
if applicable - other files
Any other scripts used to
manage Serviceguard packages
Check the configuration using the cmcheckconf -P pkgname.config, then apply the Serviceguard configuration using the cmapplyconf -P pkgname.config command or SAM.
The Serviceguard cluster is ready to automatically switch
packages to nodes in remote data centers using Metrocluster Continuous
Access EVA. Maintaining
a Cluster that Uses Metrocluster Continuous Access EVA |  |
While the package is running, a manual storage failover on
Continuous Access EVA outside of Metrocluster Continuous Access
EVA software can cause the package to halt due to unexpected condition
of the Continuous Access EVA volumes. It is recommended that no
manual storage failover be performed while the package is running. A manual change of Continuous Access EVA link state from suspend
to resume is allowed to re-establish data replication while the
package is running. Continuous
Access EVA Link Suspend and Resume ModesUpon Continuous Access links recovery, Continuous Access EVA automatically
normalizes (the Continuous Access EVA term for “synchronizes”)
the source Vdisk and destination Vdisk data. If the log disk is not full, when a Continuous Access connection
is re-established, the contents of the log are written to the destination Vdisk
to synchronize it with the source Vdisk. This process of writing
the log contents, in the order that the writes occurred, is called
merging. Since write ordering is maintained, the data on the destination
Vdisk is consistent while merging is in progress. If the log disk is full, when a Continuous Access connection
is re-established, a full copy from the source Vdisk to the destination
Vdisk is done. Since a full copy is done at the block level, the
data on the destination Vdisk is not consistent until the copy completes. If all Continuous Access links fail and if failsafe mode is
disabled, the application package continues to run and writes new
I/O to source Vdisk. The virtual log in EVA controller collects
host write commands and data; DR group's log state changes from
normal to logging. When a DR group is in a logging state, the log
will grow in proportion to the amount of write I/O being sent to
the source Vdisks. If the links are down for a long time, the log
disk may be full, and full copy will happen automatically upon link
recovery. If primary site fails while copy is in progress, the data
in destination Vdisk is not consistent, and is not usable. To prevent this,
after all Continuous Access links fail, it is recommended to manually
put the Continuous Access link state to suspend mode by using the
Command View EVA UI. When Continuous Access link is in suspend state,
Continuous Access EVA will not try to normalize the source and destination
Vdisks upon links recovery until you manually change the link state
to resume mode. There might be situations when the package has to be taken
down for maintenance purposes without having the package move to
another node. The following procedure is recommended for normal
maintenance of the Metrocluster Continuous Access EVA: Stop the package with the appropriate
Serviceguard command. # cmhaltpkg pkgname Distribute the Metrocluster Continuous Access EVA
configuration changes. # cmapplyconf -P pkgname.config Start the package with the appropriate Serviceguard
command. # cmmodpkg -e pkgname
Planned maintenance is treated the same as a failure by the
cluster. If you take a node down for maintenance, package failover
and quorum calculation is based on the remaining nodes. Make sure
that the nodes are taken down evenly at each site, and that enough
nodes remain on-line to form a quorum if a failure occurs. See “Example
Failover Scenarios with Two Arbitrators”. After resynchronization is complete, halt the package on the
failover site, and restart it on the primary site. Metrocluster
will then do a failover of the storage, which will trigger Continuous
Access EVA to swap the personalities between the source and the
destination Vdisks, returning source status to the primary site. There might be situations when the cluster has to be re-configured
due to maintenance purposes. The following procedure is recommended
for re-configuration of the Metrocluster Continuous Access EVA: Before running the cmapplyconf -C command, it is necessary to remove the cluster awareness
from the Metrocluster volume groups. This is done by halting all
Metrocluster packages by running the appropriate Serviceguard command
on the source side. # vgchange -a n <vg> Halt the entire cluster and apply your changes with
the Serviceguard command. # cmapplyconf -C Re-start the cluster and mark the cluster ID on
all Metrocluster volume groups. Run on the source side. # vgchange -c y <vg>
|