Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters: > Chapter 2 Designing a Continental Cluster

Building the Continentalclusters Configuration

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

If necessary, use the swinstall command to install the Continentalclusters product on all nodes in both clusters. Then create the Continentalclusters configuration using the following steps:

  • Prepare the security files.

  • Create the monitor package on each cluster containing a recovery package. Clusters not containing a recovery package may also monitor the other cluster in the recovery pair by creating a monitor package on that cluster.

  • Edit the Continentalclusters configuration file on a node of your choice in any cluster.

  • Check and apply the Continentalclusters configuration.

  • Start each Continentalclusters monitor package on it’s cluster.

  • Validate the configuration.

  • Document the recovery procedure and distribute the documentation to both sites. Make sure all personnel are familiar with these procedures.

  • Test recovery procedures.

Preparing Security Files

Running a Continentalclusters command requires root access to cluster information on all the nodes of the participating Serviceguard clusters in the configuration. Before doing the Continentalclusters configuration, edit the /etc/cmcluster/cmclnodelist file on each node of all the participating clusters to include entries that will allow access by all nodes in the Continentalclusters. Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two, two-node Serviceguard clusters:

lanode1.myco.com    root
lanode2.myco.com    root
nynode1.myco.com    root
nynode2.myco.com    root

Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor packages and Continentalclusters commands to obtain information from other nodes about the health of each cluster. The file must contain entries that allow access to all nodes in the continental cluster by the nodes where monitors and Continentalclusters commands are running.

Define the order of security checking by creating entries of the following types:

order deny,allow

If deny is first, the deny list is checked first to see if the node is there, then the allow list is checked.

deny from

lists all the nodes that are denied access. Permissible entries are:

all

All hosts are denied access.

domain

Hosts whose names match, or end in, this string are denied access, for example, hp.com.

hostname

The named host (for example, kitcat.myco.com) is denied access.

IP address

Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet restriction is denied.

network/netmask

This pair of addresses allows more precise restriction of hosts, (for example, 10.163.121.23/225.225.0.0).

network/nnnCIDR

This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP).

allow from

This lists all the nodes that are allowed access. Permissible entries are:

all

All hosts are allowed access.

domain

Hosts whose names match, or end in, this string are allowed access, for example, hp.com.

hostname

The named host (for example, kitcat.myco.com) is allowed access.

IP address

Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet inclusion is allowed.

network/netmask

This pair of addresses allows more precise inclusion of hosts, (for example, 10.163.121.23/225.225.0.0).

network/nnnCIDR

This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for Classless Interdomain Routing, a type of routing supported by the Border Gateway Protocol (BGP).

The most typical entry is hostname. The following entries are from a typical /etc/opt/cmom/cmomhosts file:

order allow,deny
allow from lanode1.myco.com
allow from lanode2.myco.com
allow from nynode1.myco.com
allow from nynode2.myco.com
allow from 10.177.242.12

If the file is installed on all nodes in the continental cluster, these entries will allow Continentalclusters commands and monitors running on lanode1, lanode2, nynode1, nynode2 to obtain information about the clusters in the configuration.

Network Security Configuration Requirements

In a Continentalclusters configuration, if the clusters are behind firewalls in their respective sites, you must set appropriate firewall rules to enable inter-cluster communication. The monitoring daemon of Continentalclusters communicates with Serviceguard Cluster Object Manager on remote clusters. You can determine the ports used by Cluster Object Manager from the hacl-probe entry in the /etc/services file. In the firewall of all participating clusters, you must set the rule such that TCP and UDP protocol traffic on the hacl-probe ports are allowed from and to the IP addresses of all nodes in the Continentalclusters configuration. For more information on firewall and ports, see HP Serviceguard A.11.18 Release Notes available at http://www.docs.hp.com -> High Availability.

Creating the Monitor Package

The Continentalclusters monitoring software is configured as a Serviceguard package so that it remains highly available. If more than one primary cluster is configured to share the same common recovery cluster, such as a multiple recovery pair scenario, the monitor package running on the common recovery cluster performs the following:

  • monitors all of the primary clusters

  • sends notifications for all of the monitored clusters events

The following steps should be carried out on the recovery cluster and can be repeated on the primary cluster if you want the primary cluster to monitor the recovery cluster:

  1. On the node where the configuration is located, create a directory for the monitor package.

    # mkdir /etc/cmcluster/ccmonpkg

  2. Copy the template files from the /opt/cmconcl/scripts directory to the /etc/cmcluster/ccmonpkg directory.

    # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg

    • ccmonpkg.config is the ASCII package configuration file template for the Continentalclusters monitoring application.

    • ccmonpkg.cntl is the control script file for the Continentalclusters monitoring application.

      NOTE: It is not recommended editing the ccmonpkg.cntl file. However, if preferred, change the default SERVICE_RESTART value “-r 3” to a value that fits your environment.
  3. Edit the package configuration file (suggested name of /etc/cmcluster/ccmonpkg/ccmonpkg.config) to match the cluster configuration:

    1. Add the names of all nodes in the cluster on which the monitor may run.

    2. AUTO_RUN(PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) should be set to YES so that the monitor package will fail over between local nodes. (Note, for all primary and recovery packages, AUTO_RUN is always set to NO.)

  4. Continentalclusters provides an optional feature for recovery groups to be in the maintenance mode. To enable this feature, configure the monitor package with a file system in a shared disk. For more information configuring this maintenance mode feature, see “Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters”.

  5. Use the cmcheckconf command to validate the package.

    # cmcheckconf -P ccmonpkg.config

  6. Copy the package configuration file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default name /etc/cmcluster/ccmonpkg) on all the other nodes in the cluster. Make sure this file is executable.

  7. Use the cmapplyconf command to add the package to the Serviceguard configuration.

    # cmapplyconf -P ccmonpkg.config

The following sample package configuration file (comments have been left out) shows a typical package configuration for a Continentalclusters monitor package:

PACKAGE_NAME     		 ccmonpkgPACKAGE_TYPE 		 		 		 		 		 FAILOVERFAILOVER_POLICY 		 		 		CONFIGURED_NODEFAILBACK_POLICY 		 		 					MANUALNODE_NAME LAnode1NODE_NAME LAnode2AUTO_RUN 		 		 					 		 		 					 		 		 					 		 		 					 		YESLOCAL_LAN_FAILOVER_ALLOWED      YESNODE_FAIL_FAST_ENABLED          NORUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntlRUN_SCRIPT_TIMEOUT              NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntlHALT_SCRIPT_TIMEOUT             NO_TIMEOUTSERVICE_NAME                    ccmonpkg.srvSERVICE_FAIL_FAST_ENABLED       NOSERVICE_HALT_TIMEOUT            300
CAUTION: Do not run a monitor package until the steps for “Checking and Applying the Continentalclusters Configuration” are completed.

Configuring the Maintenance Mode Feature for Recovery Groups in Continentalclusters

To configure the recovery group maintenance feature, you need to configure a file system on a shared disk in all the clusters configured in the Continentalclusters. The shared disk must have a minimum of 250MB disk space.

Specify the file system path using the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. Create this directory and reserve it for Continentalclusters on all nodes in the Continentalclusters. Configure the monitor package in the recovery clusters to mount the file system from the shared disk.

Configuring Shared Disk for the Maintenance Feature

Identify a shared disk connected to all nodes at the recovery cluster where the monitor package (ccmonpkg) will run.

Create a volume group with one volume on the shared disk and complete the following procedure:

  1. Create the physical volume:

    pvcreate -f /dev/c0t10d0

  2. Create volume group directory under the device special file namespace:

    mkdir /dev/ccvg

  3. Create the group special file using the available major number:

    mknod /dev/ccvg/group c 64 0x060000

  4. Create the volume group:

    vgcreate /dev/ccvg /dev/c0t10d0

  5. Activate the volume group:

    vgchange -a y ccvg

  6. Create the logical volume:

    lvcreate -L 250M ccvg

Run the following command to create a file system on the volume:

mkfs vxfs /dev/ccvg/lvol1

Complete the following procedure to export the volume group configuration and import the volume group on all the nodes at the recovery cluster:

  1. On the node where you created the volume, deactivate the volume group and export the VG configuration in preview mode to a file:

    vgchange -a n ccvg

    vgexport -m /tmp/ccvg.map -p ccvg

  2. Copy the file to all the nodes:

    rcp /tmp/ccvg.map node1:/tmp

  3. On each node, create the volume group directory and the group special file:

    mkdir /dev/ccvg

    mknod /dev/ccvg/group c 64 0x060000

  4. Import the volume group from the map file:

    vgimport -m /tmp/ccvg.map -v

Configuring a Monitor Package for the Maintenance Feature

Configure the Continentalclusters monitor package using the template scripts available in the /opt/cmconcl/scripts/ directory:

  1. Create the /etc/cmcluster/ccmonpkg directory on all nodes in the recovery cluster.

  2. On any node in the recovery cluster, copy the package configuration and control file template from the /opt/cmconcl/scripts directory to the /etc/cmcluster directory:

    cp /opt/cmconcl/scripts/ccmonpkg.*

  3. In the ccmonpkg.cntl monitor package control file, specify the volume group for the VG parameter in the VOLUME GROUPS section:

    VG[0]="ccvg"

  4. In the ccmonpkg.cntl monitor package control file, specify a file system path and the logical volume name under the FILE SYSTEM section. The file system path should be the value configured for the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. This path should be created and reserved on all nodes in the Continentalcluster.

    LV[0]=/dev/ccvg/lvol1; FS[0]=/opt/cmconcl/statedir; FS_MOUNT_OPT[0]="-o rw";FS_UMOUNT_OPT[0]="";FS_FSCK_OPT[0]=""; FS_TYPE[0]="vxfs"

  5. Distribute the monitor package control file to all nodes in the recovery cluster.

  6. Apply the monitor package configuration.

Editing the Continentalclusters Configuration File

First, on one cluster, generate an ASCII configuration template file using the cmqueryconcl command. The recommended name and location for this file is /etc/cmcluster/cmconcl.config. (If preferred, choose a different name.) Example:

# cd /etc/cmcluster

# cmqueryconcl -C cmconcl.config

This file has three editable sections:

  • Cluster information

  • Recovery groups

  • Monitoring definitions

Customize each section according to your needs. The following are some guidelines for editing each section.

Editing Section 1—Cluster Information

Enter cluster-level information as follows in this section of the file:

  1. Enter a name for the continental cluster on the line that contains the CONTINENTAL_CLUSTER_NAME keyword. Choose any name, but it cannot be easily changed after the configuration is applied. To change the name, it is required to first delete the existing configuration as described in “Renaming a Continental Cluster”.

    Continentalclusters provides an optional maintenance feature for recovery groups. This feature is enabled by configuring an absolute path to a file system for the CONTINENTAL_CLUSTER_STATE_DIR parameter. If this feature is not required, this parameter can be omitted.

  2. Enter the name of the first cluster after the first CLUSTER_NAME keyword followed by the names of all the nodes within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node.

  3. Enter the domain name of the cluster’s nodes following the DOMAIN_NAME keyword.

  4. Optionally, enter the name of the monitor package on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by this package will take place (minutes and/or seconds) following the MONITOR_INTERVAL keyword.

    The monitor interval defines how long it can take for Continentalclusters to detect that a cluster is in a certain state. The default interval is 60 seconds, but the optimal setting depends on your system’s performance. Setting this interval too low can result in the monitor’s falsely reporting an Unreachable or Error state. If this is observed during testing, use a larger value.

    It is suggested to use the name “ccmonpkg” for all Continentalclusters monitors. Create this package on each cluster containing a recovery package. If it is not desired to monitor a cluster, which does not containing a recovery package, it is required to delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor package on both the first and second clusters.

    NOTE: Monitoring of a cluster not containing recovery packages is optional. For example, set up monitoring of such a cluster to be able to check the status of the data replication technology being used.
  5. Repeat steps 2 through 4 for the other participating cluster or clusters.

NOTE: The monitor package is sensitive to system time and date. If you change the system time or date either backwards or forwards on the node where the monitor is running, notifications of alerts and alarms may be sent at incorrect times.

A printout of Section 1 of the Continentalclusters ASCII configuration file follows.

###############################################################################	####                                                                       ####	####               CONTINENTAL CLUSTER CONFIGURATION FILE                  ####	####                                                                       ####	####                                                                       ####	####     This file contains Continentalclusters configuration data.        ####	####     The file is divided into three sections, as follows:              ####	####                                                                       ####	####       1. Cluster Information                                          ####	####       2. Recovery Groups                                              ####	####       3. Events, Alerts, Alarms, and Notifications                    ####	####                                                                       ####	####     For complete details about how to set the parameters in           ####	####     this file, consult the cmqueryconcl(1m) manpage or your manual.   ####	####                                                                       ####	###########################################################################
####     Section 1.  Cluster Information                                   ####
#### This section contains the name of the continental cluster,name of ####
#### the state directory,followed by the names of member clusters and ####
#### all their nodes.The continental cluster name can be any string ####
#### you choose, up to 40 characters in length.The continentalclusters ####
#### state directory must be string containing the directory location. ####
#### The state directory must be always an absolute path. The state ####
#### directory should be created on a shared disk in the recovery ####
#### cluster. This parameter is optional, if maintenance mode feature ####
#### recovery groups is not required. This parameter is mandatory, ####
#### if maintenance mode feature for recovery groups is required. ####
#### Each member cluster name must be the same as it appears in the ####
#### MC/ServiceGuard cluster configuration ASCII file for that cluster.####
#### In addition to the cluster name, include a domain name for the ####
#### nodes in the cluster. Node Names must be the same as those that ####
#### appear in the cluster configuration ASCII file. A minimum of two ####
#### member cluster needs to be specified. You may configure one ####
#### cluster to serve as recovery cluster for one or more other ####
#### clusters. ####
####     In the space below, enter the continental cluster name,           ####
#### then enter a cluster name for each member cluster, followed ####
#### by the names of all the nodes in that cluster.Following ####
#### the node names, enter the name of a monitor package ####
#### that will run the continental cluster monitoring software ####
#### on that cluster. It is strongly recommended that you use the ####
#### same name for the monitoring package on all clusters; ####
#### "ccmonpkg" is suggested. Monitoring of the recovery cluster ####
#### by the primary cluster is optional. If you do not wish to ####
#### monitor the recovery cluster, you must delete or comment out the ####
#### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL lines that follow the ####
#### name of the primary cluster. ####
#### ####
####     After the monitor package name, enter a monitor interval,         ####
#### specifying a number of minutes and/or seconds. The default is 60 ####
#### seconds, the minimum is 30 seconds, and the maximum is 5 minutes. ####
####     Example:                                                          ####
#### ####
#### CONTINENTAL_CLUSTER_NAME ccluster1 ####
#### CONTINENTAL_CLUSTER_STATE_DIR /opt/cmconcl/statedir ####
####                                                                       ####	####     CLUSTER_NAME                   westcoast                          ####	####            CLUSTER_DOMAIN          westnet.myco.com                   ####	####            NODE_NAME               system1                            ####	####            NODE_NAME               system2                            ####	####            MONITOR_PACKAGE_NAME    ccmonpkg                           ####	####            MONITOR_INTERVAL        1 MINUTE 30 SECONDS                ####	####                                                                       ####	####     CLUSTER_NAME                   eastcoast                          ####	####            CLUSTER_DOMAIN          eastnet.myco.com                   ####	####            NODE_NAME               system3                            ####	####            NODE_NAME               system4                            ####	####            MONITOR_PACKAGE_NAME    ccmonpkg                           ####	####            MONITOR_INTERVAL        1 MINUTE 30 SECONDS                ####	####                                                                       ####CONTINENTAL_CLUSTER_NAME        ccluster1
CONTINENTAL_CLUSTER_STATE_DIR 
CLUSTER_NAME        CLUSTER_DOMAIN        NODE_NAME        NODE_NAME        MONITOR_PACKAGE_NAME    ccmonpkg        MONITOR_INTERVAL        60 SECONDSCLUSTER_NAME        CLUSTER_DOMAIN        NODE_NAME        NODE_NAME        MONITOR_PACKAGE_NAME    ccmonpkg        MONITOR_INTERVAL        60 SECONDS

Editing Section 2—Recovery Groups

In this section of the file, define recovery groups, which are sets of Serviceguard packages that are ready to recover applications in case of cluster failure. Create a separate recovery group for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster.

Examples of recovery groups are shown graphically in Figure 2-7 “Sample Continentalclusters Recovery Groups” and Figure 2-8 “Sample Bi-directional Recovery Groups”.

Figure 2-7 Sample Continentalclusters Recovery Groups

Sample Continentalclusters Recovery Groups

Figure 2-8 Sample Bi-directional Recovery Groups

Sample Bi-directional Recovery Groups

Enter data in Section 2 as follows:

  1. Enter a name for the recovery group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose.

  2. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example:

    PRIMARY_PACKAGE LAcluster/custpkg

  3. Optionally, enter a data sender package definition consisting of the cluster name, a slash (/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using a logical data replication method that requires a data sender package.

  4. After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example:

    RECOVERY_PACKAGE NYcluster/custpkg_bak

  5. Optionally, enter a data receiver package definition consisting of the cluster name, a slash (/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if using a logical data replication method that requires a data receiver package.

  6. Optionally, enter a rehearsal package definition consisting of the cluster name, a slash (/), and the rehearsal package name after the REHEARSAL_PACKAGE keyword. This is only required for performing a rehearsal operation at the recovery cluster.

  7. Repeat these steps for each package that will be recovered. Each package must be configured in a separate recovery group.

A printout of Section 2 of the Continentalclusters ASCII configuration file follows.

###############################################################################	####                                                                       ####
#### Section 2. Recovery Groups ####
#### ####
#### This section defines recovery groups--sets of ServiceGuard ####
#### packages that are ready to recover applications in case of ####
#### cluster failure. Recovery groups allow one cluster in the ####
#### continental cluster configuration to back up another member ####
#### cluster's packages. You create a separate recovery group ####
#### for each ServiceGuard package that will be started on the ####
#### recovery cluster when the cmrecovercl(1m) command is issued. ####
#### ####
#### A recovery group consists of a primary package running on ####
#### one cluster, a recovery package that is ready to run on a ####
#### different cluster. In some cases, a data receiver package runs ####
#### on the same cluster as the recovery package, and in some cases, ####
#### a data sender package runs on the same cluster as the primary ####
#### package.For rehearsal operations a rehearsal package forms a ####
#### part of the recovery group. The rehearsal package is configured ####
#### always in the recovery cluster. ####

#### During normal operation, the primary package is running an ####
#### application program on the primary cluster, and the recovery ####
#### package, which is configured to run the same application, is ####
#### idle on the recovery cluster. If the primary package performs ####
#### disk I/O, the data that is written to disk is replicated ####
#### and made available for possible use on the recovery cluster. ####
#### For some data replication techniques, this involves the use of ####
#### a data receiver package running on the recovery cluster. ####
#### In the event of a major failure on the primary cluster, the ####
#### user issues the cmrecovercl(1m) command to halt any data ####
#### receiver packages and start up all the recovery packages ####
#### that exist on the recovery cluster. ####
####      During rehearsal operation, before starting the rehearsal           ####
#### packages,care should be taken that the replication between the ####
#### primary and the recovery sites is suspended. For some data ####
#### replication techniques which involve the use of a data receiver ####
#### package, rehearsal operations must be commenced only after ####
#### shutting down the data receiver package at the recovery ####
#### cluster. Rehearsal packages are started using the ####
#### cmrecovercl -r command. ####
####     Enter the name of each package recovery group together with       ####
#### the fully qualified names of the primary and recovery ####
#### packages. If appropriate, enter the fully qualified name ####
#### of a data receiver package. Note that the data receiver ####
#### package must be on the same cluster as the recovery package. ####
#### The primary package name includes the primary cluster name ####
#### followed by a slash ("/") followed by the package name on ####
#### the primary cluster. The recovery package name includes ####
#### the recovery cluster name, followed by a slash ("/") ####
#### followed by the package name on the recovery cluster. ####
#### The data receiver package name includes the recovery cluster ####
#### name, followed by a slash ("/") followed by the name of ####
#### the data receiver package on the recovery cluster.The ####
#### rehearsal package name includes the recovery cluster name, ####
#### followed by a slash ("/"). ####

#### Up to 29 recovery groups can be entered. ######## ####
#### Example: ####
#### ####
#### RECOVERY_GROUP_NAME nfsgroup ####
#### PRIMARY_PACKAGE westcoast/nfspkg ####
#### DATA_SENDER_PACKAGE westcoast/nfssenderpkg ####
#### RECOVERY_PACKAGE eastcoast/nfsbackuppkg ####
#### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg ####
#### REHEARSAL_PACKAGE eastcoast/nfsrehearsalpkg ####
#### ####
#### RECOVERY_GROUP_NAME hpgroup ####
#### PRIMARY_PACKAGE westcoast/hppkg ####
#### DATA_SENDER_PACKAGE westcoast/hpsenderpkg ####
#### RECOVERY_PACKAGE eastcoast/hpbackuppkg ####
#### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg ####
#### REHEARSAL_PACKAGE eastcoast/hprehearsalpkg ####

Editing Section 3—Monitoring Definitions

Finally, enter monitoring definitions that define cluster events and set times at which alert and alarm notifications are to be sent out. Define notifications for all cluster events—Unreachable, Down, Up, and Error.

Although it is impossible to make specific recommendations for every Continentalclusters environment, here are a few general guidelines about notifications.

  1. Specify the cluster event by using the CLUSTER_EVENT keyword followed by the name of the cluster, a slash (“/”) and the name of the status—Unreachable, Down, Up, or Error. Example:

    CLUSTER_EVENT LAcluster/UNREACHABLE

  2. Define a CLUSTER_ALERT at appropriate times following the appearance of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about the event. Create as many alerts as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types). Note that the message text in the notification must be on a separate line in the file.

  3. If the event is for a cluster in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time since the appearance of the event (greater than the time used for the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken. Create as many alarms as needed, and send as many notifications as needed to different destinations (see the comments in the file excerpt below for a list of destination types).

  4. If using a monitor on a cluster containing no recovery packages, define alerts for the monitoring of Up, Down, Unreachable, and Error states on the recovery cluster. It is not necessary to define alarms.

A printout of Section 3 of the Continentalclusters ASCII configuration file follows.

###############################################################################	####                                                                       ####	####     Section 3.  Monitoring Definitions                                ####	####                                                                       ####	####     This section of the file contains monitoring definitions.         ####	####     Well planned monitoring definitions will help in making the       ####	####     decision whether or not to issue the cmrecovercl(1m) command.     ####	####     Each monitoring definition specifies a cluster event along with   ####	####     the messages that should be sent to system administrators         ####	####     or other IT staff. All messages are appended to the default log   ####	####     /var/opt/resmon/log/cc/eventlog as well as to the destination you ####	####     specify below.                                                    ####	####                                                                       ####	####     A cluster event takes place when a monitor that is located on     ####	####     one cluster detects a significant change in the condition         ####	####     of another cluster. The monitored cluster conditions are:         ####	####                                                                       ####	####  UNREACHABLE    - the cluster is unreachable. This will               ####	####                   occur when the communication link to the            ####	####                   cluster has gone down, as in a WAN failure,         ####	####                   or when the all nodes in the cluster have           ####	####                   failed.                                             ####	####                                                                       ####	####         DOWN    - the cluster is down but nodes are responding.       ####	####                   This will occur when the cluster is halted,         ####	####                   but some or all of the member nodes are booted      ####	####                   and communicating with the monitoring cluster.      ####	####                                                                       ####	####         UP      - the cluster is up.                                  ####	####                                                                       ####	####         ERROR   - there is a mismatch of cluster versions or          ####	####                   a security error.                                   ####	####                                                                       ####	####     A change from one of these conditions to another one is a         ####	####     cluster event. You can define alert or alarm states based on the  ####	####     length of time since the cluster event was observed. Some events  ####	####     are noteworthy at the time they occur, and some are noteworthy    ####	####     when they persist over time.  Setting the elapsed time to zero    ####	####     results in a message being sent as soon as the event takes place. ####	####     Setting the elaspsed time to 5 minutes results in a message       ####	####     being sent when the condition has persisted for 5 minutes.        ####	####                                                                       ####	####     An alert is intended as informational only. Alerts may be sent    ####	####     for any type of cluster condition. For an alert, a notification   ####	####     is sent to a system administrator or other destination.  Alerts   ####	####     are not intended to indicate the need for recovery.  The          ####	####     cmrecovercl(1m) command is disabled.                              ####	####                                                                       ####	####     An alarm is an indication that a condition exists that may        ####	####     require recovery.  For an alarm, a notification is sent, and      ####	####     in addition, the cmrecovercl(1m) command is enabled for immediate ####	####     execution, allowing the administrator to carry out cluster        ####	####     recovery.  An alarm can only be defined for an UNREACHABLE or     ####	####     DOWN condition in the monitored cluster.                          ####	####                                                                       ####	####     A notification defines a message that is appended to the          ####	####     log file /var/opt/resmon/log/cc/eventlog and sent to other        ####	####     specified destinations, including email addresses, SNMP traps,    ####	####     the system console, or the syslog file. The message string in     ####	####     a notification can be no more than 170 characters. Enter          ####	####     notifications in one of the following forms:                      ####	####                                                                       ####	####         NOTIFICATION CONSOLE                                          ####	####             <message>                                                 ####	####                    Message written to the console.                    ####	####                                                                       ####	####         NOTIFICATION EMAIL   <address>                                ####	####             <message>                                                 ####	####                    Message emailed to a fully                         ####	####                    qualified email address.                           ####	####                                                                       ####	####         NOTIFICATION OPC   <level>                                    ####	####             <message>                                                 ####	####                    The <message> is sent to                           ####	####                    OpenView IT/Operations).                           ####	####                    The value of <level> may be 8 (normal),            ####	####                    16 (warning), 64 (minor), 128 (major),             ####	####                    32 (critical).                                     ####	####                                                                       ####	####         NOTIFICATION SNMP  <level>                                    ####	####             <message>                                                 ####	####                    The <message> is sent as an SNMP trap.             ####	####                    The value of <level> may be 1 (normal),            ####	####                    2 (warning), 3 (minor), 4 (major),                 ####	####                    5 (critical).                                      ####	####                                                                       ####	####         NOTIFICATION SYSLOG                                           ####	####             <message>                                                 ####	####                    A notice of the event is appended to the           ####	####                    syslog file.                                       ####	####                                                                       ####	####         NOTIFICATION TCP   <nodename>:<portnumber>                    ####	####             <message>                                                 ####	####                    Message is sent to a TCP port on the               ####	####                    specified node.                                    ####	####                                                                       ####	####         NOTIFICATION TEXTLOG  <pathname>                              ####	####             <message>                                                 ####	####                    A notice of the event is written to a user-        ####	####                    specified log file.  <pathname> must be a full     ####	####                    path for the user-specified file.The user     	    ####
####			 specified file must be under /var/opt/resmon/log directory.				            ####	####                                                                       ####	####         NOTIFICATION UDP   <nodename>:<portnumber>                    ####	####             <message>                                                 ####	####                    Message is sent to a UDP port on the               ####	####                    specified node.                                    ####	####                                                                       ####	####     For the cluster event, enter a cluster name followed by           ####
#### a slash ("/") and a cluster condition (UP, DOWN, UNREACHABLE, ####
#### ERROR) that may be detected by a monitor program. ######## #### #### Each cluster event must be paired with a monitoring cluster. #### #### Include the name of the cluster on which the monitoring #### #### will take place. Events can be monitored from either the #### #### primary cluster or the recovery cluster. #### #### #### #### Alerts, alarms, and notifications have the following syntax. #### #### #### #### CLUSTER_ALERT <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alert notification about the #### #### cluster event. #### #### #### #### CLUSTER_ALARM <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alarm notification about the #### #### cluster event and enables the cmrecovercl(1m) #### #### command for immediate execution. #### #### #### #### NOTIFICATION <type> #### #### <message> #### #### A string value which is sent from the #### #### monitoring cluster for a given event #### #### to a specified destination. The <message>, #### #### which can be no more than 170 characters, #### #### is also appended to the #### #### /var/opt/resmon/log/cc/eventlog #### #### file on the monitoring node in the cluster #### #### where the event was detected. #### #### #### #### Example: #### #### #### #### CLUSTER_EVENT westcoast/UNREACHABLE #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 5 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 5 min. Call secondary site." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." ######## ######## CLUSTER_ALERT 10 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 10 min.Call secondary site." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." ######## NOTIFICATION CONSOLE #### #### "Cluster ALERT: westcoast not responding." ######## #### #### CLUSTER_ALARM 15 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 15 min. Takeover advised." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "westcoast still not responding. Use cmrecovercl command." #### #### NOTIFICATION CONSOLE #### #### "Cluster ALARM: Issue cmrecovercl command to take over "westcoast." #### #### #### #### CLUSTER_EVENT westcoast/UP #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is up." ######## #### #### CLUSTER_EVENT westcoast/DOWN #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is down." ######## #### #### CLUSTER_EVENT westcoast/ERROR #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Error in monitoring cluster westcoast." ######## ####CLUSTER_EVENT <cluster_name>/UNREACHABLE
 MONITORING_CLUSTER        CLUSTER_ALERT                NOTIFICATION                NOTIFICATION        CLUSTER_ALERT                NOTIFICATION                NOTIFICATION        CLUSTER_ALARM                NOTIFICATION                NOTIFICATIONCLUSTER_EVENT   <cluster_name>/DOWN        MONITORING_CLUSTER        CLUSTER_ALERT                NOTIFICATION                NOTIFICATION        CLUSTER_ALERT                NOTIFICATION                NOTIFICATION        CLUSTER_ALARM                NOTIFICATION                NOTIFICATIONCLUSTER_EVENT   <cluster_name>/UP        MONITORING_CLUSTER        CLUSTER_ALERT                NOTIFICATIONCLUSTER_EVENT   <cluster_name>/ERROR        MONITORING_CLUSTER        CLUSTER_ALERT                NOTIFICATION

The TEXTLOG notification file should be placed under the /var/opt/resmon/log directory. If any other directory is specified, an error is reported by the cmapplyconcl and cmcheckconcl commands.

If you specify any other location for logging, the following error message appears:

The target after textlog “ ” is not valid. 
Please specify a file under /var/opt/resmon/log directory 

If you upgraded Continentalclusters but are still using the old configuration file, the textlog location is still specified as /var/adm/cmconcl. As a result, the following error message appears:

The file path “s” specified for textlog is invalid. 

The destination file must be under /var/opt/resmon/log directory. Please change the path and restart the ccmon package.

IMPORTANT: For TEXTLOG notification, the destination log file must be in the /var/opt/resmon/log directory. If the destination file is not available in this directory, Continentalclusters will not work properly.

Selecting Notification Intervals

The monitor interval determines the amount of time between distinct attempts by the monitor to obtain the status of a cluster. The intervals associated with notifications need to be chosen to work in combination with the monitor interval to give a realistic picture of cluster events.

Some combinations are not useful. For example, notification intervals that are smaller than the monitor interval do not make sense, and should be avoided. In the following example, the cluster event will always result in two alerts followed by an alarm. No change of state could possibly be detected at the one-minute, two-minute and three-minute intervals, because the monitor does not check for changes until the monitor interval (5 minutes) has been reached.

MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 5 MINUTES...
CLUSTER_EVENT LACluster/UNREACHABLE
CLUSTER_ALERT    1 MINUTE
NOTIFICATION CONSOLE
"1 Minute Alert: LACluster Unreachable"
CLUSTER_ALERT 2 MINUTES
NOTIFICATION CONSOLE
"2 Minute Alert: LACluster Still Unreachable"
CLUSTER_ALARM 3 MINUTES
NOTIFICATION CONSOLE
"ALARM: LACluster Unreachable after 3 Minutes: Recovery Enabled"

The following sequence could provide meaningful notifications, since a change of state is possible between notification intervals:

MONITOR_PACKAGE_NAME ccmonpkg 
MONITOR_INTERVAL 1 MINUTE
...
CLUSTER_EVENT LACluster/UNREACHABLE
CLUSTER_ALERT 3 MINUTES
NOTIFICATION CONSOLE
"3 Minute Alert: LACluster Unreachable"
CLUSTER_ALERT 5 MINUTES
NOTIFICATION CONSOLE
"5 Minute Alert: LACluster Still Unreachable"
CLUSTER_ALARM 10 MINUTES
NOTIFICATION CONSOLE
"ALARM: LACluster Unreachable after 10 Minutes: Recovery Enabled"
NOTE: The notification intervals should be multiples of the monitor interval.

The following is a sample Continentalclusters configuration file with two recovery pairs. Both cluster1 and cluster2 are configured to have cluster3 as their recovery cluster for package pkg1 and pkg2, and cluster3 is configured to have cluster1 as its recovery cluster for pkg3.

#Section 1.  Cluster Information
CONTINENTAL_CLUSTER_NAME sampleCluster
CONTINENTAL_CLUSTER_STATE_DIR /opt/cmconcl/statedir

CLUSTER_NAME cluster1
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node11
NODE_NAME node12
MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 60 SECONDS

CLUSTER_NAME cluster2
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node21
NODE_NAME node22

CLUSTER_NAME cluster3
CLUSTER_DOMAIN cup.hp.com
NODE_NAME node31
NODE_NAME node32
MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 60 SECONDS
RECOVERY_GROUP_NAME               ccRG1        PRIMARY_PACKAGE           cluster1/pkg1        RECOVERY_PACKAGE          cluster3/pkg1’        REHEARSAL_PACKAGE         cluster3/pkg4’
RECOVERY_GROUP_NAME               ccRG2
PRIMARY_PACKAGE cluster2/pkg2
RECOVERY_PACKAGE cluster3/pkg2’

RECOVERY_GROUP_NAME ccRG3
RECOVERY_PACKAGE cluster3/pkg3
DATA_RECEIVER_PACKAGE cluster1/pkg3’

# Section 3. Monitoring Definitions ####

CLUSTER_EVENT cluster1/DOWN
MONITORING_CLUSTER cluster3
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) DOWN alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster1 DOWN alert”
CLUSTER_ALARM 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) DOWN alarm”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster1 DOWN alarm”

CLUSTER_EVENT cluster2/DOWN
MONITORING_CLUSTER cluster3
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) DOWN alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster2 DOWN alert”
CLUSTER_ALARM 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) DOWN alarm”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster2 DOWN alarm”

CLUSTER_EVENT cluster3/DOWN
MONITORING_CLUSTER cluster1
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/logging
“DRT: (Ora-test) DOWN alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster3 DOWN alert”
CLUSTER_ALARM 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) DOWN alarm”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster3 DOWN alarm”

CLUSTER_EVENT cluster1/UP
MONITORING_CLUSTER cluster3
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) UP alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster1 UP alert”

CLUSTER_EVENT cluster2/UP
MONITORING_CLUSTER cluster3
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) UP alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster2 UP alert”

CLUSTER_EVENT cluster3/UP
MONITORING_CLUSTER cluster1
CLUSTER_ALERT 0 SECONDS
NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog
“DRT: (Ora-test) UP alert”
NOTIFICATION SYSLOG
“DRT: (Ora-test) cluster3 UP alert”

Checking and Applying the Continentalclusters Configuration

After editing the configuration file on any of the participating clusters in the Continentalcluster, halt any monitor packages that are running, then use the following steps to apply the configuration to all nodes in the continental cluster.

  1. Verify the content of the file.

    # cmcheckconcl -v -C cmconcl.config

    This command will verify that all parameters are within range, all fields are filled out, and the entries (such as NODE_NAME) are valid.

  2. Distribute the Continentalclusters configuration information to all nodes in the continental cluster.

    # cmapplyconcl -v -C cmconcl.config

    Configuration data is copied to all nodes and in all the participating clusters. This data includes a set of managed object files that are copied to the /ec/cmconcl/instances directory on every node in all clusters.

  3. Be sure to make a backup copy of the configuration ascii file and save it on the other cluster after it is applied.

NOTE: If any problems occur during the execution of cmapplyconcl, repeat the command as often as necessary. Issuing the command will delete the existing Continentalclusters configuration and apply the new one.

When configuration is finished, your systems should have sets of files similar to those shown in Figure 2-9 “Continentalclusters Configuration Files”.

Figure 2-9 Continentalclusters Configuration Files

Continentalclusters Configuration Files

Starting the Continentalclusters Monitor Package

Starting the monitoring package enables all Continentalclusters monitoring functionality. Before doing this, ensure that the primary packages selected to be protected are running normally and that data sender and receiver packages, if they are being used for logical data replication, are working properly.

If using physical data replication, make sure that it is operational.

On each monitoring cluster start the monitor package.

# cmmodpkg -e ccmonpkg

After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will be created on the node where the package is running to record the Continentalclusters monitoring activities. It is recommended that this log file be archived or cleaned up periodically.

Validating the Configuration

The following table shows the status of Continentalclusters packages in a recovery pair when each cluster is running normally and no recovery has taken place.

Table 2-6 Status of Continentalclusters Packages Before Recovery

Primary ClusterRecovery Cluster

Data Replication Method

Primary PackageData Sender Package

Optional Monitor Package

Recovery Package

Data Receiver Package

Required Monitor Package

Physical— Symmetrix

RunningNot used

Running (optional)

Halted

Not used

Running (required)

Physical— XP Series

RunningNot usedRunning (optional)HaltedNot usedRunning (required)

Physical—EVA Series

Running

Not used

Running (optional)

Halted

Not used

Running (required)

Logical— Oracle Standby Database

RunningNot usedRunning (optional)HaltedRunningRunning (required)

 

Use the following steps to ensure the components are functioning correctly:

  1. Make sure all daemons are running.

    # ps -ef | grep cmcl

    Two important Continentalclusters daemons are cmclsentryd and cmclrmond.

  2. Check the cluster configuration on each cluster using the cmviewcl -v command.

    1. Ensure that each primary package is running correctly.

    2. Ensure that the data sender packages (if any are used for logical data replication) are running correctly.

    3. Ensure that the data receiver packages (if any are used for logical data replication) are running correctly.

    4. Ensure that the continental cluster monitor package is running correctly on each monitoring cluster.

  3. On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors.

  4. On nodes where packages are running, check all package log files for errors, including application packages and the monitor package.

  5. Use the following command to verify the correct operation of the Continentalclusters daemon:

    # /opt/cmom/tools/bin/cmreadlog -f \/var/adm/cmconcl/sentryd.log

  6. Make sure the Continentalclusters monitor packages (default name ccmonpkg) on each cluster fails over properly if a node fails.

  7. Change each cluster’s state to test that the monitor running on the monitoring cluster will detect the change in status and send notification.

  8. View the status of the Continentalclusters primary and recovery clusters, including configured event data.

    # cmviewconcl -v

CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is great.

Chapters 3, 4 and 5 contain additional suggestions on testing the data replication and package configuration.

Documenting the Recovery Procedure

Once everything is configured and the Continentalclusters monitor is running, it is necessary to define your recovery procedure and train the administrators and operators at both sites. The checklist in Figure 2-10 “Recovery Checklist” is an example of to document the recovery procedure.

Figure 2-10 Recovery Checklist

Recovery Checklist

Reviewing the Recovery Procedure

Using the checklist described in the previous section, step through the recovery procedure to make sure that all necessary steps are included. If possible, create simulated failures to test the alert and alarm scenarios coded in the Continentalclusters configuration file.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.