 |
» |
|
|
 |
The following section describes how to configure a continental
cluster solution using EMC SRDF, which requires the Metrocluster
with EMC SRDF product. Setting
up a Primary Package on the Primary Cluster |  |
Use the procedures in this section to configure a primary
package on the primary cluster. Consult the Managing
Serviceguard user’s guide for more detailed
instructions on setting up Serviceguard with packages, and for instructions
on how to start, halt, and move packages and their services between
nodes in a cluster. If this was not done previously,
split the EMC SRDF logical links for the disks associated with the
application package. See the script, Samples/pre.cmquery (edit to the SRDF groups configured) for an example
of how to automate this task. The script must be customized with
the Symmetrix device group names. Create and test a standard Serviceguard cluster
using the procedures described in the Managing Serviceguard user’s
guide. Install Continentalclusters on all the cluster nodes
in the primary cluster (Skip this step if the software has been
pre installed)  |  |  |  |  | NOTE: Serviceguard should already be installed on all the
cluster nodes. |  |  |  |  |
Run swinstall(1m)to install Continentalclusters and Metrocluster with EMC
SRDF products from an SD depot. When swinstall(1m) has completed, create a directory as follows for the
new package in the primary cluster. # mkdir /etc/cmcluster/<pkg_name> Copy the environment file template /opt/cmcluster/toolkit/SRDF/srdf.env to the package directory, naming it pkgname_srdf.env: # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env Create an Serviceguard Application package configuration
file. # cd /etc/cmcluster/<pkg_name> # cmmakepkg -p <pkg_name>.conf Customize it as appropriate to your application. Be sure to
include Node names, the pathname of the control script (/etc/cmcluster/<pkg_name>/<pkg_name>.cntl) for the RUN_SCRIPT and HALT_SCRIPT parameters. Also change AUTO_RUN (PKG_SWITCHING_ENABLED in Serviceguard A.11.09) to NO. This will ensure
that the application packages will not start automatically. (the ccmonpkg will be set to yes) Define the service (as required) Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application
using the guidelines in the Managing Serviceguard user’s
guide. Standard Serviceguard package customizations
include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters.
Be sure to set LV_UMOUNT_COUNT to 1 or greater. Add customer-defined run and halt commands in the
appropriate places according to the needs of the application. See
the Managing Serviceguard user’s
guide for more information on these functions. Edit the environment file
<pkg_name>_srdf.env as follows: Add the
path where the EMC Solutions Enabler software binaries have been
installed to the PATH environment variable. The default location is
/usr/symcli/bin. Uncomment AUTO* environment variables. It is recommended to retain
the default values of these variables unless there is a specific
business requirement to change them. See Appendix B for an explanation
of these variables. Uncomment the PKGDIR variable and set it to the full path name of the
directory where the control script has been placed. This directory
must be unique for each package and is used for status data files.
For example, set PKGDIR to /etc/cmcluster/<pkg_name>. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device group
names given in the ’symdg list’ command. The DEVICE_GROUP variable may also contain the consistency group name
if using a M by N configuration. Uncomment the RETRY and RETRYTIME variables. The defaults should be used for the
first package. The values should be slightly different for other
packages. RETRYTIME should increase by two seconds for each package.
The product of RETRY * RETRYTIME should be approximately five minutes. These variables
are used to decide how often and how many times to retry the Symmetrix
status commands. For example, if there are three packages with data on a particular
Symmetrix pair (connected by SRDF), then the values for RETRY and RETRYTIME might be as follows: Table 5-3 RETRY and RETRYTIME Values | | RETRYTIME | RETRY |
|---|
| pkgA | 60 | 5 | | pkgB | 43 | 7 | | pkgC | 33 | 9 |
Uncomment the CLUSTER_TYPE variable and set it to “continental”. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application.
Edit the remaining control script variables (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application as
it runs on the primary cluster. See the Managing Serviceguard manual
for more information on these variables. Add customer-defined run and halt commands in the
appropriate places according to the needs of the application. See
the Serviceguard manual for more information on these functions. Distribute EMC SRDF package configuration, environment,
and control script files to other nodes in the primary cluster by
using ftp or rcp. # rcp -p /etc/cmcluster/<pkg_name>/<pkg_name>.cntl \ other_node:/etc/cmcluster/<pkg_name>/<pkg_name>.cntl When using ftp, be sure to make the file executable on any destination
systems. Verify that each host in both clusters in the continental
cluster has the following files in the directory /etc/cmcluster/<pkg_name>: <pkg_name>.cntl (EMC SRDF package control script) <pkg_name>.conf (Serviceguard package ASCII config file) <pkg_name>.sh (Package monitor shell script, if applicable) <pkg_name>_srdf.env (Metrocluster EMC SRDF environment file)
Split the SRDF logical links for the disks associated
with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script
must be customized with the Symmetrix device group names. Apply the Serviceguard configuration using the cmapplyconf command or SAM. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup
and failover. Restore the SRDF logical links for the disks associated
with the application package. See the script Samples/post.cmapply (after recovery cluster is completed in next section)
for an example of how to automate this task. The script must be
customized with the Symmetrix device group names.
The primary cluster is now ready for the Continentalclusters
operation. Setting
up a Recovery Package on the Recovery Cluster |  |
The installation of EMC SRDF, Serviceguard, and Continentalclusters software
is exactly the same as in the previous section. The procedures below will install and configure a recovery
package on the recovery cluster. Consult the Managing
Serviceguard user’s guide for instructions
on setting up a Serviceguard cluster (that is, LAN, VG, LV,...etc). Split the EMC SRDF logical links
for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script
must be edited to refer to the SRDF groups configured and customized
with the Symmetrix device group names. Generate a cluster ASCII file. # cmquerycl -n node1 -n node2 -C CClusterNY.ascii Edit the file CClusterNY.ascii. Be sure to select a primary cluster lock disk that
is not a lock disk on the recovery cluster.
Edits include spreading HEARTBEAT_IP on all user LANs, and setting MAX_PACKAGES. Check the configuration. # cmcheckconf -C CClusterNY.ascii Create the cluster binary. # cmapplyconf -C CClusterNY.ascii Test the cluster. # cmruncl -v # cmviewcl -v Does the cluster come up? If so, then stop the cluster: # cmhaltcl -f Copy the package files from the primary cluster
to a bkpkgXXX directory, and rename it to <backup_pkg_name>.cntl and <backup_pkg_name>_srdf.env. Edit the recovery package control file from the primary
cluster for the secondary cluster. Change the subnet, relocatable
IP, and nodes. Be sure to set AUTO_RUN to NO in the package ASCII file. Edit the recovery package environment file <bk_pkg_name>_srdf.env as follows: Add the path
for EMC Solutions Enabler software binaries. Make sure that all AUTO* variables are uncommented. Uncomment the PKGDIR variable and set it to the full path name of the
directory where the control script has been placed. This directory
must be unique for each package and is used for status data files.
For example, set PKGDIR to /etc/cmcluster/<backup_pkgname>. Uncomment the DEVICE_GROUP variable and set them to the Symmetrix device
group names given in the symdg list command. The DEVICE_GROUP variable may also contain the consistency group
name if using a M by N configuration. Uncomment the RETRY and RETRYTIME variables. Make sure the CLUSTER_TYPE variable is set to “continental”. Uncomment the RDF_MODE and set it to “asyc” or “sync” as appropriate to your application.
Edit the remaining application package control script
variables in the package control script (VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART) according to the needs of the application. See
the Managing Serviceguard user’s
guide for more information on these variables. Change the Subnet
IP from ftp copy. Verify that each host in both clusters in the continental
cluster has the following files in the directory /etc/cmcluster/<pkg_name>: <backup_pkg_name>.cntl (continental cluster package control script) <backup_pkg_name>.conf (Serviceguard package ASCII config file) <backup_pkg_name>.sh (Package monitor shell script, if applicable) <backup_pkg_name>_srdf.env (Metrocluster SRDF environment file) Split the SRDF logical links for the disks associated
with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script
must be customized with the Symmetrix device group names. Apply the Serviceguard configuration using the cmapplyconf command or SAM for the recovery cluster. Test the cluster and packages. # cmruncl # cmmodpkg -e bkpkgCCA # cmviewcl -v Note that cmmodpkg is used to manually start the application packages. Do all application packages start? If so, then issue the following command. # cmhaltcl -f  |  |  |  |  | NOTE: Application packages cannot run on R1 and R2 at the
same time. Any running package on the primary cluster that will
have a counterpart on the recovery cluster must be halted to prevent
data corruption. |  |  |  |  |
Restore the SRDF logical links for the disks associated
with the application package. See the script Samples/post.cmapply for an example of how to automate this task. The script
must be customized with the Symmetrix device group names.
The recovery cluster is now ready for continental cluster
operation. Setting
up the Continental Cluster Configuration |  |
The procedures below will configure Continentalclusters and
the monitoring packages on the two clusters. For complete details
on creating and editing the configuration file, refer to Chapter 2 “Designing
a Continental Cluster” Split the SRDF logical links
for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script
must be customized with the Symmetrix device group names. Generate the Continentalclusters configuration using
the following command: # cmqueryconcl -C cmconcl.config Edit the configuration file cmconcl.config with the names of the two clusters, the nodes in each
cluster, the recovery groups and the monitoring definitions. The
recovery groups define the primary and recovery packages. Note that
when data replication is done using EMC SRDF, there are no data
sender and receiver packages. Define the monitoring parameters, the notification mechanism
(ITO, email, console, SNMP, syslog or tcp) and notification type
(alert or alarm) based on the cluster status (unknown, down, up
or error). Descriptions for these can be found in the configuration
file generated in the previous step. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. On all nodes in both clusters copy the monitor package
files from /opt/cmconcl/scripts to /etc/cmcluster/ccmonpkg. Edit the monitor package configuration as needed in
the file /etc/cmcluster/ccmonpkg/ccmonpkg.config. Set the AUTO_RUN flag to YES. This is in contrast to the flag setting for the
application packages. The monitor package should start automatically
when the cluster is formed. Apply the monitor package to both cluster configurations. # cmapplyconf -P /etc/cmcluster/ccmonpkg/ccmonpkg.config Restore the logical SRDF links for the package.
See the script Samples/post.cmapply for an example of how to automate this task. The script
must be customized with the appropriate Symmetrix device group names.
Example: # Samples/post.cmapply Generate the cluster configuration file using cmapplyconcl. Files are placed in /etc/cmconcl/instances. There is no change to /etc/cmcluster/cmclconfig nor is there an equivalent file for Continentalclusters.
Example: # cmapplyconcl -C cmconcl.config Start the monitor package on both clusters. The monitor package for a cluster checks the status of the
other cluster and issues alerts and alarms, as defined in the Continentalclusters
configuration file, based on the other cluster’s status. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. Start the primary packages on the primary cluster
using cmrunpkg. Test local failover within the primary cluster. View the status of the continental cluster primary
and recovery clusters, including configured event data. # cmviewconcl -v
The continental cluster is now ready for testing. See Chapter 2 “Designing
a Continental Cluster” section “Testing
the Continental Cluster”. Switching
to the Recovery Cluster in Case of Disaster |  |
It is vital the administrator verify that recovery is needed
after receiving a cluster alert or alarm. Network failures may produce
false alarms. After validating a failure, start the recovery process using
the cmrecovercl [-f] command. Note the following: During an alert, the cmrecovercl will not start the recovery packages
unless the -f option is used. During an alarm, the cmrecovercl will start the recovery packages without
the -f option. When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery packages on the
recovery cluster. This condition applies not only when
no alert or alarm was issued, but also applies to the situation
where there was an alert or alarm, but the primary cluster recovered
and its current status is Up. Verify SRDF links are Up. # symrdf list
Failback
Scenarios |  |
There is no failback counterpart to the “pushbutton” failover
from the primary cluster to the recovery cluster. Failback is dependent
on the original nature of the failover, the state of primary and
secondary Symmetrix SRDF volumes (R1 and R2) and the condition of
the primary cluster. In Chapter 2 “Designing
a Continental Cluster” there
is a discussion of failback mechanisms and methodologies in the
section “Restoring
Disaster Tolerance”. The goal of HP Continentalclusters is to maximize system and application
availability. However, even systems configured with Continentalclusters
can experience hardware failures at the primary site or the recovery
site, as well as the hardware or networking failures connecting
the two sites. The following discussion addresses some of those
failures and suggests recovery approaches applicable to the environments
using data replication provided by Symmetrix Disk Arrays and Symmetrix
Remote Data Facility SRDF. The primary site has lost power, including backup power (UPS),
to both the systems and disk arrays that make up the Serviceguard
Cluster at the primary site. There is no loss of data on either
the Symmetrix or the operating systems of the systems at the primary
site. After reception of the Continentalclusters alerts and alarm,
the administrators at the recovery site follow the prescribed processes
and recovery procedures to start the protected applications on the
recovery cluster. The Continentalclusters package control file will
invoke Metrocluster with EMC SRDF to evaluate the status of the
R1 and R2 paired group volumes. The command symrdf list will display status of the device group.  |
Source (R1) View Target (R2) View MODES -------------------------------- ------------------------ ----- ------------ ST LI ST Standard A N A Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE -------------------------------- -- ------------------------ ----- ------------ DEV001 009F WD 0 0 NR 00A5 RW 0 0 S.. Failed Over DEV002 00A0 WD 0 0 NR 00A6 RW 0 0 S.. Failed Over |
After power is restored to the primary site, the Symmetrix
device groups may be in the status of Failed Over. The procedure
to move the application packages back to the primary site are different
depending on the status of the device groups. The following procedure applies to the situation where the
device groups have a status of “Failed Over”: Halt the Continentalclusters recovery
packages at the recovery site. # cmhaltpkg <pkg_name> This will halt any applications, remove any floating IP addresses, unmount
file systems and deactivate volume groups as programmed into the
package control files. The status of the device groups will remain “Synchronized” at
the recovery site and “Failed Over” at the primary
site. Halt the cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming
they have been properly configured the Continentalclusters primary
packages should not start. The monitor package should start automatically. Manually start the Continentalclusters primary packages
at the primary site. # cmrunpkg <pkg_name> or # cmmodpkg -e <pkg_name> The control script is programmed to handle this case. The
control script will issue an SRDF failback command to move the device group
back to the R1 side and to resynchronize the R1 from the R2 side.
Until the resynchronization is complete, the SRDF “read-through” feature
will ensure that any reads on the R1 side will be current, by reading
data through the SRDF link from the R2 side.  |  |  |  |  | NOTE: If the system administrator does not want synchronization performed
from the remote (recovery) site, the device groups should be split
and recreated manually. |  |  |  |  |
Ensure that the monitor packages at the primary
and recovery sites are running. Verify device group is synchronized. # symrdf list Manually bring the package back if the package does
not come up, and the device group status is “failed over.” # symrdf -g pkgCCB_r1 failback Execute an RDF ’Failback’ operation for device group ’pkgCCB_r1’ (y/[n]) ? y An RDF ’Failback’ operation execution is in progress for device group ’pkgCCB_r1’. Please wait... Write Disable device(s) on RA at target (R2)..............Done. Suspend RDF link(s).......................................Done. Merge device track tables between source and target.......Started. Device: 001 ............................................. Merged. Merge device track tables between source and target.......Done. Resume RDF link(s)........................................Done. Read/Write Enable device(s) on SA at source (R1)..........Done. The RDF ’Failback’ operation successfully executed for device group ’pkgCCB_r1’. |
During the resync; the status goes from failed over > invalid > SyncInProg. Example:  |
ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View ------------------------------------------------------------------------ STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv----------------------- Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ------ ------ --- ----------------- 000 000 R2:2 RW WD RW SYN DIS OFF 0 0 WD RW Synchronized 001 001 R2:2 RW WD RW SYN DIS OFF 12 0 WD WD Invalid ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View ---------------------------------------------------------------------------- STATUS M O D E S RDF S T A T E S Sym RDF --------- ------------ R1 Inv R2 Inv ---------------------- Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks Tracks Dev RDev Pair --- ---- ----- --------- ------------ ------ ------ --- ---------------- 000 000 R2:2 RW WD RW SYN DIS OFF 0 0 WD RW Synchronized 001 001 R2:2 RW WD RW SYN DIS OFF 2 0 WD RW SyncInProg |
Halt the recovery cluster and restart it. # cmhaltcl -f (if the cluster is not already down) # cmruncl Verify the data for data consistency and currency.
The primary site Symmetrix experienced a catastrophic hardware
failure and all data was lost on the array. After the reception
of the Continentalclusters alerts and alarm, the administrators
at the recovery site follow prescribed processes and recovery procedures
to start the protected applications on the recovery cluster. The
Continentalclusters package control file will invoke Metrocluster
with EMC SRDF to evaluate the status of the Symmetrix SRDF paired
volumes. Since the systems at the primary site are accessible, but
the Symmetrix is not, the control file will evaluate the paired
volumes with a local status of “failed over”.
The control file script is programmed to handle this condition and
will enable the volume groups, mount the logical volumes, assign
floating IP addresses and start any processes as coded into the
script. After the primary site Symmetrix is repaired and configured,
use the following procedure to move the application package back
to the primary site. Manually create the Symmetrix
device groups and gatekeeper configurations device groups. Re-run
the scripts mk3symgrps* and mk4gatekpr* which do the following: # date >ftsys1.group.list # symdg create -type RDF1 pkgCCA_r1 # symld -g pkgCCA_r1 add pd /dev/rdsk/c7t0d0 # symgate define pd /dev/rdsk/c7t15d0 # symgate define pd /dev/rdsk/c7t15d1 # symgate -g pkgCCA_r1 associate pd /dev/rdsk/c7t15d0
Halt the Continentalclusters recovery packages at
the recovery site. # cmhaltpkg <pkg_name> This will halt any applications, remove any floating IP addresses, unmount
file systems and deactivate volume groups as programmed into the
package control files. The status of the paired volumes will be
SPLIT at both the recovery and primary sites. Halt the Cluster, which also halts the monitor package ccmonpkg. Start the cluster at the primary site. Assuming
they have been properly configured the Continentalclusters primary
packages should not start. The monitor package should start automatically. Since
the paired volumes have a status of SPLIT at both the primary and
recovery sites, the EMC views the two halves as unmirrored. Issue the following command: # symrdf -g pkgCCB_r1 failback Since the most current data will be at the remote or recovery
site, this command to synchronize from the remote site). Wait for
the synchronization process to complete before progressing to the
next step. Failure to wait for the synchronization to complete will
result in the package failing to start in the next step. Manually start the Continentalclusters primary packages
at the primary site using # cmrunpkg <PKG_NAME> The control script is programmed to handle this case. The
control script recognizes the paired volume is synchronized and
will proceed with the programmed package startup. Verify the device group is synchronized. # symrdf list Ensure that the monitor packages at the primary
and recovery sites are running.
Maintaining
the EMC SRDF Data Replication Environment |  |
The following is the normal Continentalclusters startup procedure.
On the primary cluster: Start the primary cluster. # cmruncl -v The primary cluster comes up with ccmonpkg up. The application packages are down, and ccmonpkg is up. Manually start application packages on the primary
cluster. # cmmodpkg -e <Application_pkgname> Confirm primary cluster status. # cmviewcl -v and # cmviewconcl -v Verify SRDF Links. # symrdf list
On the recovery cluster, do the following: Start the recovery cluster. # cmruncl -v The recovery cluster comes up with ccmonpkg up. The application packages (bkpkgX) stay down, and ccmonpkg is up. Do not manually start application packages on the
recovery cluster; this will cause data corruption. Confirm recovery cluster status. # cmviewcl -v and # cmviewconcl -v
There might be situations where a package has to be taken
down for maintenance purposes without having the package move to
another node. The following procedure is recommended for normal
maintenance of the Continentalclusters with EMC SRDF data replication: Shut down the package with the
appropriate command. Example: # cmhaltpkg <pkgname> Distribute the package configuration changes. Example: # cmapplyconf - P <pkgconfig> (Primary cluster) # cmapplyconf -P <bkpkgconfig> (Recovery cluster) Start up the package with the appropriate Serviceguard
command. Example: # cmmodpkg -e <pkgname> (Primary cluster)  |  |  |  |  | CAUTION: Never enable package switching on both the primary
package and the recovery package. |  |  |  |  |
Halt the monitor package. # cmhaltpkg ccmonpkg To apply the new continental cluster configuration. # cmapplyconcl -C <configfile> Restart the monitor package. # cmrunpkg ccmonpkg
|