 |
» |
|
|
 |
The following section describes how to configure a continental
cluster solution using Continuous Access XP, which requires the
Metrocluster CA product. Setting
up a Primary Package on the Primary Cluster |  |
Use the procedures in this section to configure a primary
package on the primary cluster. Consult the Serviceguard documentation
for more detailed instructions on setting up Serviceguard with packages,
and for instructions on how to start, halt, and move packages and
their services between nodes in a cluster.  |  |  |  |  | NOTE: Neither the primary cluster nor the recovery cluster
may configure an XP series paired volume, PVOL or SVOL, as a cluster
lock disk. A cluster lock disk must always be writeable. Since
it cannot be guaranteed that either half of a paired volume is always
writeable, neither half may be used as a cluster lock disk. A configuration
with a cluster lock disk that is part of a paired volume is not
a supported configuration. |  |  |  |  |
Create and test a standard Serviceguard
cluster using the procedures described in the user’s manual, Managing
Serviceguard. Install Continentalclusters on all the cluster nodes
in the primary cluster (Skip this step if the software has been
preinstalled)  |  |  |  |  | NOTE: Serviceguard should already be installed on all the
cluster nodes. |  |  |  |  |
Run swinstall(1m) to install Continentalclusters and Metrocluster Continuous
Access (CA) products from an SD depot. When swinstall(1m) has completed, create a directory as follows for the
new package in the primary cluster: # mkdir /etc/cmcluster/<package_name> Create an Serviceguard package configuration file in the primary cluster
with the commands: # cd /etc/cmcluster/<package_name> # cmmakepkg -p <package_name>.ascii Customize it as appropriate to your application. Be sure
to include the pathname of the control script (/etc/cmcluster/<package_name>/ <package_name>.cntl)
for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start
when the cluster starts. Only after primary packages start, use cmmodpkg to enable package switching on all primary packages.
Enabling package switching in the package configuration would automatically
start the primary package when the cluster starts. However, had
there been a primary cluster disaster, resulting in the recovery
package starting and running on the recovery cluster, the primary
package should not be started until after first stopping the recovery
package. Create a package control script with the command: # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application
using the guidelines in Managing Serviceguard. Standard
Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters.
Be sure to set LV_UMOUNT_COUNT to 1 or greater. Add customer-defined run and halt commands in the
appropriate places according to the needs of the application. See Managing Serviceguard for
more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/SGCA/xpca.env to the package directory, naming it pkgname_xpca.env: # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env Edit the environment file <pkgname>_xpca.env as follows: If necessary, add the path where the
RaidManager software binaries have been installed to the PATH environment
variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. Uncomment the behavioral configuration environment
variables starting with AUTO_. It is recommended that you retain the default
values of these variables unless you have a specific business requirement
to change them. See Appendix A for an explanation of these variables. Uncomment the PKGDIR variable
and set it to the full path name of the directory where the control
script has been placed. This directory, which is used for status
data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/package_name, removing any quotes around the file names. Uncomment the DEVICE_GROUP variable
and set it to this package’s Raid Manager device group
name, as specified in the Raid Manager configuration file. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used
or disabled. If Raid Manager protection facility is enabled set
it to the name of the HORCM permission file. Uncomment the HORCMINST variable and set it to the Raid Manager instance
name used by Metrocluster/CA. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster
requirements. This variable is used to compare with the actual fence
level returned by the array. If
you are using asynchronous data replication, set the HORCTIMEOUT variable to a value greater than the side file timeout
value configured with the Service Processor (SVP), but less than
the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default
setting is the side file timeout value + 60 seconds. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL.
Distribute
Metrocluster/CA configuration, environment and control script files
to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already
exist on all nodes. Using ftp may be preferable at your organization, since it does
not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster
has the following files in the directory /etc/cmcluster/pkgname: - pkgname.cntl
Metrocluster/CA package control
script - pkgname_xpca.env
Metrocluster/CA environment
file - pkgname.ascii
Serviceguard package ASCII
configuration file - pkgname.sh
Package monitor shell script,
if applicable - other files
Any other scripts you use
to manage Serviceguard packages.
The Serviceguard cluster is ready to automatically switch
packages to nodes in remote data centers using Metrocluster/CA. Edit the file /etc/rc.config.d/raidmgr, specifying
the Raid Manager instance to be used for Continentalclusters, and
specify that the instance be started at boot time. The appropriate Raid Manager instance used by Continentalclusters must be
running before the package is started. This
normally means that the Raid Manager instance must be started before Serviceguard is
started. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup
and package failover. Any running package on the primary cluster that
will have a counterpart on the recovery cluster must be halted at
this time.
Setting
up a Recovery Package on the Recovery Cluster |  |
Use the procedures in this section to configure a recovery
package on the recovery cluster. Consult the Serviceguard documentation
for more detailed instructions on setting up Serviceguard with packages,
and for instructions on how to start, halt, and move packages and
their services between nodes in a cluster.  |  |  |  |  | NOTE: Neither the primary cluster nor the recovery cluster
may configure an XP series paired volume, PVOL or SVOL, as a cluster
lock disk. A cluster lock disk must always be writable. Since
it cannot be guaranteed that either half of a paired volume is always
writable, they may not be used as a cluster lock disk. Using a
disk as a cluster lock disk that is part of a paired volume is not
a supported configuration. |  |  |  |  |
Create and test a standard Serviceguard
cluster using the procedures described in the user’s manual, Managing
Serviceguard. Install Continentalclusters on all the cluster nodes
in the recovery cluster (Skip this step if the software has been
preinstalled)  |  |  |  |  | NOTE: Serviceguard should already be installed on all the
cluster nodes. |  |  |  |  |
Run swinstall(1m) to install Continentalclusters and Metrocluster Continuous
Access (CA) products from an SD depot. The toolkit integration
scripts, environment file and contributed scripts will reside in
the /opt/cmcluster/toolkit/SGCA and /usr/sbin directories When swinstall(1m) has completed, create a directory as follows for the
new package in the recovery cluster: # mkdir /etc/cmcluster/<package_name> Create an Serviceguard package configuration file in the recovery cluster
with the commands: # cd /etc/cmcluster/<package_name> # cmmakepkg -p <package_name>.ascii Customize it as appropriate to your application.
Be sure to include the pathname of the control script (/etc/cmcluster/<package_name>/ <package_name>.cntl)
for the RUN_SCRIPT and HALT_SCRIPT parameters. Set the AUTO_RUN flag to NO. This is to ensure the package will not start
when the cluster starts. Do not use cmmodpkg
to enable package switching on any recovery package. Enabling package switching
will automatically start the recovery package. Package switching
on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts
the recovery package. Create a package control script with the command: # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application
using the guidelines in Managing Serviceguard. Standard
Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters.
Be sure to set LV_UMOUNT_COUNT to 1 or greater.  |  |  |  |  | NOTE: Some of the control script variables, such as VG and LV, on the recovery cluster must be the same as on
the primary cluster. Some of the control script variables, such
as, FS, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART are probably the same as on the primary cluster.
Some of the control script variables, such as IP and SUBNET, on the recovery cluster are probably different from
those on the primary cluster. Make sure that you review all the
variables accordingly. |  |  |  |  |
Add customer-defined run and halt commands in the
appropriate places according to the needs of the application. See Managing Serviceguard for
more information on these functions. Copy the environment file template /opt/cmcluster/toolkit/SGCA/xpca.env to the package directory, naming it pkgname_xpca.env: # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/pkgname_xpca.env Edit the environment file <pkgname>_xpca.env as follows: If necessary, add the path where the
RaidManager software binaries have been installed to the PATH environment
variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. Uncomment the behavioral configuration environment
variables starting with AUTO_. It is recommended that you retain the default
values of these variables unless you have a specific business requirement
to change them. See Appendix A for an explanation of these variables. Uncomment the PKGDIR variable
and set it to the full path name of the directory where the control
script has been placed. This directory, which is used for status
data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/package_name, removing any quotes around the file names. Uncomment the DEVICE_GROUP variable
and set it to this package’s Raid Manager device group
name, as specified in the Raid Manager configuration file. Uncomment the HORCMPERM variable and use the default value MGRNOINST if Raid Manager protection facility is not used
or disabled. If Raid Manager protection facility is enabled set
it to the name of the HORCM permission file. Uncomment the HORCMINST variable and set it to the Raid Manager instance
name used by Metrocluster/CA. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster
requirements. This variable is used to compare with the actual fence
level returned by the array. If
you are using asynchronous data replication, set the HORCTIMEOUT variable to a value greater than the side file timeout
value configured with the Service Processor (SVP), but less than
the RUN_SCRIPT_TIMEOUT set in the package configuration file. The default
setting is the side file timeout value + 60 seconds. Uncomment the CLUSTER_TYPE variable and set it to CONTINENTAL.
Distribute
Continentalcluster/CA configuration, environment and control script
files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp. This script assumes the package directories already
exist on all nodes. Using ftp may be preferable at your organization, since it does
not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. Apply the Serviceguard configuration using the cmapplyconf command or SAM. Verify that each node in the Serviceguard cluster
has the following files in the directory /etc/cmcluster/pkgname: - bkpbkgname.cntl
Metrocluster/CA package control
script - bkpkgname_xpca.env
Metrocluster/CA environment
file - bkpkgname.ascii
Serviceguard package ASCII
configuration file - bkpkgname.sh
Package monitor shell script,
if applicable - other files
Any other scripts you use
to manage Serviceguard packages
Edit the file /etc/rc.config.d/raidmgr, specifying
the Raid Manager instance to be used for Continentalclusters, and
specify that the instance be started at boot time.  |  |  |  |  | NOTE: The appropriate Raid Manager instance used by Continentalclusters must be
running before the package is started. This
normally means that the Raid Manager instance must be started before Serviceguard is
started. |  |  |  |  |
Make sure the packages on the primary cluster are
not running. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg,
cmhaltpkg) test the recovery cluster for cluster and package startup
and package failover. Any running package on the recovery cluster that
has a counterpart on the primary cluster should be halted at this
time.
Setting
up the Continental Cluster Configuration |  |
The steps below are the basic procedure for setting up the Continentalclusters
configuration file and the monitoring packages on the two clusters.
For complete details on creating and editing the configuration file,
refer to Chapter 4 “Designing
a Continental Cluster” Generate the Continentalclusters
configuration using the following command: # cmqueryconcl -C cmconcl.config Edit the configuration file cmconcl.config with
the names of the two clusters, the nodes in each cluster, the recovery
groups and the monitoring definitions. The recovery groups define
the primary and recovery packages. When data replication is done
using Continuous Access XP, there are no data sender and receiver
packages. Define the monitoring parameters, the notification mechanism
(ITO, email, console, SNMP, syslog or tcp) and notification type
(alert or alarm) based on the cluster status (unknown, down, up
or error). Descriptions for these can be found in the configuration
file generated in the previous step. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package
files from /opt/cmconcl/scripts to/etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in
the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for
the application packages. We want the monitor package to start
automatically when the cluster is formed. Apply the monitor package to both cluster configurations
using the following command: # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config Apply the continental cluster configuration file
using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/cmclconfig nor is there an equivalent file for Continentalclusters.
Example: # cmapplyconcl -C cmconcl.config Start the monitor package on both clusters.  |  |  |  |  | NOTE: The monitor package for a cluster checks the status
of the other cluster and issues alerts and alarms, as defined in
the Continentalclusters configuration file, based on the other cluster’s status. |  |  |  |  |
Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. Start the primary packages on the primary cluster
using cmrunpkg. Test local failover within the primary cluster. View the status of the Continentalcluster primary
and recovery clusters, including configured event data:
The continental cluster is now ready for testing. See “Testing
the Continental Cluster”. Switching
to the Recovery Cluster in Case of Disaster |  |
It is vital the administrator verify that recovery is needed
after receiving a cluster alert or alarm. Network failures may
produce false alarms. After validating a failure, start the recovery
process using the cmrecovercl [-f] command. Note the following: During an alert, the cmrecovercl will
not start the recovery packages unless the -f option
is used. During an alarm, the cmrecovercl will start
the recovery packages without the -f option. When there is neither an alert nor an alarm condition,
cmrecovercl cannot start the recovery packages on the
recovery cluster. This condition applies not only when
no alert or alarm was issued, but also applies to the situation
where there was an alert or alarm, but the primary cluster recovered
and its current status is Up.
Failback
Scenarios |  |
The goal of HP Continentalclusters is to maximize system and application
availability. However, even systems configured with Continentalclusters
can experience hardware failures at the primary site or the recovery
site, as well as the hardware or networking failures connecting
the two sites. The following discussion addresses some of those
failures and suggests recovery approaches applicable to environments
using data replication provided by HP StorageWorks XP series disk
arrays and Continuous Access (CA). In Chapter 4 “Designing
a Continental Cluster” there is a discussion of failback mechanisms
and methodologies in “Restoring
Disaster Tolerance”. The primary site has lost power, including backup power (UPS),
to both the systems and disk arrays that make up the Serviceguard
Cluster at the primary site. There is no loss of data on either
the XP disk array or the operating systems of the systems at the
primary site. The primary site XP disk array experienced a catastrophic
hardware failure and all data was lost on the array. Failback
in Scenarios 1 and 2After reception of the Continentalclusters alerts and alarm,
the administrators at the recovery site follow the prescribed processes
and recovery procedures to start the protected applications on the
recovery cluster. Each Continentalclusters package control script
that invokes Metrocluster CA XP will evaluate the status of the
XP paired volumes. Since neither the systems nor the XP disk array
at the primary site are accessible, the control file will initially
report the paired volumes with a local status of SVOL_PAIR or SVOL_PSUE (in ASYNC mode) and a remote status of EX_ENORMT, PSUE or PSUS, indicating that there is an error accessing the
primary site. The control file script is programmed to handle this
condition and will enable the volume groups, mount the logical volumes,
assign floating IP addresses and start any processes as coded into
the script.  |  |  |  |  | NOTE: In ASYNC mode, the package will halt unless a force flag
is present or unless the auto variable AUTO_SVOLPSUE is set to 1. |  |  |  |  |
The fence level of the paired volume—NEVER, ASYNC, or DATA—will not impact the starting of the packages
at the recovery site. The Metrocluster CAXP pre-integrated solution
will perform the following command with regards to the paired volume: # horctakeover -g <dev-grp-name> -S Subsequently, the paired volume will have a status of SVOL_SWSS. To view the local status of the paired volumes
run: # pairvolchk -g <dev-grp-name> -s To view the remote status of the paired volumes, run # pairvolchk -g <dev-grp-name> -c (While the remote XP disk array and primary cluster systems
are down, the command will time out with an error code of 242.) After power is restored to the primary site, or when a newly
configured array is brought online, the XP paired volumes may have
either a status of PVOL_PSUE on the primary site or SVOL_SWSS on the secondary site. The following procedure
applies to this situation: While the
package is still running, issue the following command from the recovery
host: # pairresync -g <dgname> -c 15 -swaps This starts the resynchronization, which can take a long time
if the entire primary disk array was lost or a short time if the
primary array was intact at the time of failover. When resynchronization is complete, halt the Continentalclusters recovery
packages at the recovery site using the command # cmhaltpkg <pkg_name> This will halt any applications, remove any floating IP addresses, unmount
file systems and deactivate volume groups as programmed into the
package control files. The status of the paired volumes will remain SVOL_PAIR at the recovery site and PVOL_PAIR at the primary site. Start the cluster at the primary site. Assuming
they have been properly configured, the Continentalclusters primary
packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages
at the primary site using the command: # cmrunpkg <pkg_name> Ensure that the monitor packages at the primary
and recovery sites are running.
Failback
When the Primary Has SMPL StatusThe following procedure applies to the situation where the
primary site paired volumes have a status that has been set to SMPL, possibly through manual intervention: Halt the Continentalclusters recovery packages at
the recovery site using the command # cmhaltpkg <pkg_name> This will halt any applications, remove any floating IP addresses, unmount
file systems and deactivate volume groups as programmed into the
package control files. The status of the paired volumes will remain SMPL at the recovery site and PSUE at the primary site. Start the cluster at the primary site. Assuming
they have been properly configured the Continentalclusters primary
packages should not start. The monitor package should start automatically. Since the paired volumes have a status of SMPL at
both the primary and recovery sites, the XP views the two halves
as unmirrored. From a system at the primary site, manually create
the paired volume: # paircreate -g <dev-grp-name> -f <fence-level> -vr -c 15 See the RM User Guide on more options for the paircreate command. Since the most current data will be at the remote or recovery
site, this will synchronize the data from the remote or recovery
site (use of the -vr option directs the command to synchronize from
the remote site). Wait for the synchronization process to complete
before proceeding to the next step. Failure to wait for the synchronization to
complete will result in the package failing to start in the next
step. Manually start the Continentalclusters primary
packages at the primary site using the command # cmrunpkg <pkg_name> The control script is programmed to handle this case. The
control script recognizes that the paired volume is synchronized
and will proceed with the programmed package startup. Ensure that monitor packages are running at both
sites.
Maintaining
the Continuous Access XP Data Replication Environment |  |
After certain failures, data are no longer remotely protected.
In order to restore disaster-tolerant data protection after repairing
or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant
data protection to be restored. Following is a partial list of
failures that require running pairresync to restore disaster-tolerant data protection: failure of ALL CA links without restart of the application failure of ALL CA links with Fence Level DATA with restart of the application on a primary host failure of the entire recovery Data Center for a
given application package failure of the recovery XP disk array for a given
application package while the application is running on a primary
host
Following is a partial list of failures that require full
resynchronization to restore disaster-tolerant data protection.
Full resynchronization is automatically initiated by moving the
application package back to its primary host after repairing the
failure. failure of the entire primary Data
Center for a given application package failure of all of the primary hosts for a given
application package failure of the primary XP disk array for a given
application package failure of all CA links with application restart
on a secondary host
 |  |  |  |  | NOTE: The preceding steps are automated provided the
default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the CA link failure has been fixed, the
user only needs to halt the package at the failover site and restart
on the primary site. However, if you want to reduce the amount of
application downtime, you should manually invoke pairresync before failback. |  |  |  |  |
Full resynchronization
must be manually initiated as described in
the next section) after repairing the following failures: failure of the recovery XP disk array
for a given application package followed by application startup
on a primary host failure of all CA links with
Fence Level NEVER or ASYNC with restart of the application on a primary host
Pairs must be manually recreated if both the primary and recovery
XP disk arrays are in the SMPL (simplex) state. Make sure you periodically review the following files for
messages, warnings and recommended actions. You should particularly
review these files after system, data center and/or application
failures: /var/adm/syslog/syslog.log /etc/cmcluster/<package-name>/<package-name>.log /etc/cmcluster/<bkpackage-name/<bkpackage-name>.log
Using
the pairresync CommandThe pairresync command can be used with special options after
a failover in which the recovery site has started the application
and has processed transaction data on the disk at the recovery site,
but the disks on the primary site are intact. After the CA link
is fixed, you use the pairresync command in one of the following two ways depending on which
site you are on: pairresync -swapp—from the primary site. pairresync -swaps—from the failover site.
These options take advantage of the fact that the recovery
site maintains a bit-map of the modified data sectors on the recovery
array. Either version of the command will swap the personalities
of the volumes, with the PVOL becoming the SVOL and SVOL becoming
the PVOL. With the personalities swapped, any data that has been
written to the volume on the failover site (now PVOL) are then copied
back to the SVOL now running on the primary site. During this time
the package continues running on the failover site. After resynchronization
is complete, you can halt the package on the failover site, and
restart it on the primary site. Metrocluster will then swap the
personalities between the PVOL and the SVOL, returning PVOL status
to the primary site. The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration
the extra startup time due to getting status from the XP disk array.
See the previous paragraph for more information on the extra startup
time. Online cluster configuration
changes may require a Raid Manager configuration file to be changed.
Whenever the configuration file is changed, the Raid Manager instance
must be stopped and restarted. The Raid Manager
instance must be running before any Continentalclusters
package movement occurs. A given file system must not reside on more than
one XP frame for either the PVOL or the
SVOL. A given LVM Logical Volume (LV) must not reside on more than
one XP frame for either the PVOL or the
SVOL. The application is responsible for data integrity,
and must use the O_SYNC flag when ordering of I/Os is important.
Most relational database products are examples of applications that
ensure data integrity by using the O_SYNC flag. Each host must be connected to only the XP disk
array that contains either the PVOL or the
SVOL. A given host must not be connected to both
the PVOL and the SVOL of a continuous access pair.
|