 |
» |
|
|
 |
If necessary, use the swinstall command to install the ContinentalClusters product on all nodes
in both clusters. Then create the ContinentalClusters configuration
using the following steps (each step is described in detail in the
sections that follow): Prepare the security files. Create the monitor package on each cluster containing
a recovery package. Clusters not containing a recovery package may
also monitor the other cluster by creating a monitor package on
that cluster. Edit the ContinentalClusters configuration file on a node
of your choice in either cluster. Check and apply the ContinentalClusters configuration. Start each ContinentalClusters monitor package on it's
cluster. Validate the configuration. Document the recovery procedure and distribute the
documentation to both sites. Make sure all personnel are familiar
with these procedures. Test recovery procedures.
Preparing Security Files |  |
Configuring a continental cluster requires root access to
all the nodes in both clusters in the configuration. Before doing
the ContinentalClusters configuration, edit the /.rhosts file on all the nodes of both clusters to include entries
that will allow access by the node on which you will run the cmapplyconcl command. Here is a sample entry in the /.rhosts file that allows the root user
on system3 to run the cmapplyconcl command on the node where the /.rhosts file contains the entry: system3.westcoast.myco.com root |
After the cmapplyconcl command has been run successfully, you can remove this
entry from the /.rhosts file if you wish. Remember, however, that the entry
must be present in the /.rhosts file when you use cmapplyconcl at a later time.  |  |  |  |  | NOTE: The cmclnodelist file does not provide the required
type of access for the cmapplyconcl command. |  |  |  |  |
You must also create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running
monitor packages to obtain information from other nodes about the
health of each cluster. The file must contain entries that allow
access to all nodes in the continental cluster by the nodes where
monitors are running. You define the order of security checking by creating entries
of the following types: - order
deny,allow
If deny is first, the deny list is checked first
to see if the node is there, then the allow list is checked. - deny from
lists all the nodes that are denied access. Permissible entries
are: - all
All hosts are denied access. - domain
Hosts whose names match, or end in, this string
are allowed access, e.g. hp.com. - hostname
The named host (for example, kitcat.myco.com)
is denied access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet restriction is allowed. - network/netmask
This pair of addresses allows more precise restriction
of hosts, (e.g. 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification, except
the netmask consists of nnn high-order 1 bits. "CIDR" stands
for classless interdomain routing, a type of routing supported by
the Border Gateway protocol (BGP).
- allow from
lists all the nodes that are allowed access. Permissible entries
are: - all
All hosts are allowed access. - domain
Hosts whose names match, or end in, this string
are allowed access, e.g. hp.com. - hostname
The named host (for example, kitcat.myco.com)
is allowed access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet inclusion is allowed. - network/netmask
This pair of addresses allows more precise inclusion
of hosts, (e.g. 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification, except
the netmask consists of nnn high-order 1 bits. "CIDR" stands
for classless interdomain routing, a type of routing supported by
the Border Gateway protocol (BGP).
The most typical entry is hostname. The following entries are from a typical /etc/opt/cmom/cmomhosts file: order allow,deny allow from lanode1.myco.com allow from lanode2.myco.com allow from nynode1.myco.com allow from nynode2.myco.com allow from 10.177.242.12
|
If the file is installed on all nodes in the continental cluster,
these entries will allow the monitors running on lanode1, lanode2,
nynode1, and nynode2 to obtain information about the health of all
nodes in the configuration. Creating the Monitor Package |  |
The ContinentalClusters monitoring software is configured as an MC/ServiceGuard package
so that it remains highly available. The following steps should
be carried out on the recovery cluster and repeated on the primary
cluster if you want to monitor the recovery site from the primary
site: On the node where you are doing the
configuration, create a directory for the monitor package: # mkdir /etc/cmcluster/ccmonpkg Copy the template files from the /opt/cmconcl/scripts directory to the /etc/cmcluster/ccmonpkg directory: # cp /opt/cmconcl/scripts/* /etc/cmcluster/ccmonpkg ccmonpkg.config is the ASCII package configuration
file template for the ContinentalClusters monitoring application.
Edit the package configuration file (suggested name
of /etc/cmcluster/ccmonpkg/ccmonpkg.config) to match the cluster configuration: Add the names of all nodes in the cluster
on which the monitor may run. AUTO_RUN(PKG_SWITCHING_ENABLED used
prior to ServiceGuard 11.12) should be set to YES so
that the monitor package will fail over between local nodes. (Note,
however, that for all primary and recovery packages, AUTO_RUN is always set to NO.)
Use the cmcheckconf command to validate the package: # cmcheckconf -P ccmonpkg.config Use the cmapplyconf command to add the package to the MC/ServiceGuard configuration: # cmapplyconf -P ccmonpkg.config Copy the package control script ccmonpkg.cntl to
the monitor package directory (default name /etc/cmcluster/ccmonpkg) on all the other nodes in the
cluster. Make sure this file is executable.
The following sample package configuration file (comments
have been left out) shows a typical package configuration for a ContinentalClusters monitor
package:
#PACKAGE_NAME ccmonpkg FAILOVER_POLICY CONFIGURED_NODE FAILBACK_POLICY MANUAL NODE_NAME LAnode1 NODE_NAME LAnode2 RUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl RUN_SCRIPT_TIMEOUT NO_TIMEOUT HALT_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl HALT_SCRIPT_TIMEOUT NO_TIMEOUT SERVICE_NAME ccmonpkg.srv SERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 300 AUTO_RUN YES NET_SWITCHING_ENABLED YES NODE_FAIL_FAST_ENABLED NO |
Editing the ContinentalClusters Configuration
File |  |
First, on one cluster, generate an ASCII configuration template
file using the cmqueryconcl command. The recommended name and location
for this file is /etc/cmcluster/cmconcl.config. (You can choose a different name if you wish.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.config This file has three editable sections: Customize
each section according to your needs. The following are some guidelines
for editing each section. Editing Section 1—Cluster InformationEnter cluster-level information as follows in this section
of the file: Enter a name for the continental
cluster on the line that contains the CONTINENTAL_CLUSTER_NAME keyword. This can be any name you choose, but
it cannot be changed after the configuration is applied. To change
the name, you must first delete the existing configuration as described
in “Renaming a Continental Cluster”. Enter the name of the first cluster after
the first CLUSTER_NAME keyword followed by the names of all the nodes
within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node. Enter the domain name of the cluster's
nodes following the DOMAIN_NAME keyword. Optionally, enter the name of the monitor package
on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by
this package will take place (minutes and/or seconds) following
the MONITOR_INTERVAL keyword. The monitor interval defines how long it can take for ContinentalClusters
to detect that a cluster is in a certain state. The default interval
is 60 seconds, but the optimal setting depends on your system's
performance. Setting this interval too low can result in the monitor's
falsely reporting an Unreachable or Error state. If you observe
this during testing, use a larger value. It is suggested that you use the name "ccmonpkg" for
all ContinentalClusters monitors. Create this package on each cluster containing
a recovery package. If you do not wish to monitor a cluster, not
containing a recovery package, you must delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor
package on both the first and second clusters.  |  |  |  |  | NOTE: Monitoring of a cluster not containing recovery packages
is optional. For example, you might set up monitoring of such a
cluster so you can check the status of the data replication technology
being used. |  |  |  |  |
Repeat steps 2 through 4 for the alternate cluster.
 |  |  |  |  | NOTE: The monitor package is sensitive to system time and
date. If you change the system time or date either backwards or
forwards on the node where the monitor is running, notifications
of alerts and alarms may be sent at incorrect times. |  |  |  |  |
A printout of Section 1 of the ContinentalClusters ASCII configuration file
follows.  |
############################################################################### #### #### #### CONTINENTAL CLUSTER CONFIGURATION FILE #### #### #### #### #### #### This file contains ContinentalClusters configuration data. #### #### The file is divided into three sections, as follows: #### #### #### #### 1. Cluster Information #### #### 2. Recovery Groups #### #### 3. Events, Alerts, Alarms, and Notifications #### #### #### #### For complete details about how to set the parameters in #### #### this file, consult the cmqueryconcl(1m) manpage or your manual. #### #### #### ############################################################################### #### #### #### Section 1. Cluster Information #### #### #### #### This section contains the name of the continental cluster #### #### followed by the names of member clusters and all their nodes. #### #### The continental cluster name can be any string you choose, up #### #### to 40 characters in length. Each member cluster name must be #### #### the same as it appears in the MC/ServiceGuard cluster config- #### #### uration ASCII file for that cluster. In addition to the cluster #### #### name, include a domain name for the nodes in the cluster. Node #### #### names must be the same as those that appear in the cluster #### #### configuration ASCII file. #### #### #### #### In the space below, enter the continental cluster name, #### #### then enter a cluster name and domain for each member cluster, #### #### and the names of all the nodes in that cluster. Following #### #### the node names, enter the name of a monitor package #### #### that will run the continental cluster monitoring software #### #### on that cluster. It is strongly recommended that you use the #### #### same name for the monitoring package on all clusters; #### #### "ccmonpkg" is suggested. Monitoring of the recovery cluster #### #### by the primary cluster is optional. If you do not wish to #### #### monitor the recovery cluster, you must delete or comment out the #### #### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL lines that follow the #### #### name of the primary cluster. #### #### #### #### After the monitor package name, enter a monitor interval, #### #### specifying a number of minutes and/or seconds. The default is 60 #### #### seconds, the minimum is 30 seconds, and the maximum is 5 minutes. #### #### #### #### Example: #### #### #### #### CONTINENTAL_CLUSTER_NAME ccluster1 #### #### #### #### CLUSTER_NAME westcoast #### #### CLUSTER_DOMAIN westnet.myco.com #### #### NODE_NAME system1 #### #### NODE_NAME system2 #### #### MONITOR_PACKAGE_NAME ccmonpkg #### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### #### #### CLUSTER_NAME eastcoast #### #### CLUSTER_DOMAIN eastnet.myco.com #### #### NODE_NAME system3 #### #### NODE_NAME system4 #### #### MONITOR_PACKAGE_NAME ccmonpkg #### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### #### CONTINENTAL_CLUSTER_NAME ccluster1 CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS |
 |
Editing Section 2—Recovery GroupsIn this section of the file, you define recovery groups, which
are sets of ServiceGuard packages that are ready to recover applications
in case of cluster failure. You create a separate recovery group
for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster. Examples of recovery groups are shown graphically in Figure 5-6 “Sample ContinentalClusters Recovery Groups” and Figure 5-7 “Sample Bi-directional Recovery Groups”.
Enter data in Section 2 as follows: Enter a name for the recovery group
following the RECOVERY_GROUP_NAME keyword. This can be any name you choose. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting
of the cluster name followed by a slash (/) followed by the package
name. Example: PRIMARY_PACKAGE LAcluster/custpkg Optionally, enter a data sender package definition
consisting of the cluster name, a slash (/), and the data sender
package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using
a logical data replication method that requires a data sender package. After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting
of the cluster name followed by a slash (/) followed by the package
name. Example: RECOVERY_PACKAGE NYcluster/custpkg_bak Optionally, enter a data receiver package definition
consisting of the cluster name, a slash (/), and the data receiver
package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if you are using
a logical data replication method that requires a data receiver
package. Repeat these steps for each package that will be
recovered. Each package must be configured in a separate recovery
group.
A printout of Section 2 of the ContinentalClusters ASCII configuration file
follows.  |
############################################################################### #### #### #### Section 2. Recovery Groups #### #### #### #### This section defines recovery groups--sets of ServiceGuard #### #### packages that are ready to recover applications in case of #### #### cluster failure. Recovery groups allow one cluster in the #### #### continental cluster configuration to back up another member #### #### cluster's packages. You create a separate recovery group #### #### for each ServiceGuard package that will be started on the #### #### recovery cluster when the cmrecovercl(1m) command is issued. #### #### #### #### A recovery group consists of a primary package running on #### #### one cluster, and a recovery package that is ready to run on a #### #### different cluster. In some cases, a data receiver package runs #### #### on the same cluster as the recovery package, and in some cases, #### #### a data sender package runs on the same cluster as the primary #### #### package. #### #### #### #### During normal operation, the primary package is running an #### #### application program on the primary cluster, and the recovery #### #### package, which is configured to run the same application, is #### #### idle on the recovery cluster. If the primary package performs #### #### disk I/O, the data that is written to disk is replicated #### #### and made available for possible use on the recovery cluster. #### #### For some data replication techniques, this involves the use of #### #### a data receiver package running on the recovery cluster. #### #### In the event of a major failure on the primary cluster, the #### #### user issues the cmrecovercl(1m) command to halt any data #### #### receiver packages and start up all the recovery packages #### #### that exist on the recovery cluster. #### #### #### #### Enter the name of each package recovery group together with #### #### the fully qualified names of the primary and recovery #### #### packages. If appropriate, enter the fully qualified name #### #### of a data receiver package. Note that the data receiver #### #### package must be on the same cluster as the recovery package. #### #### #### #### The primary package name includes the primary cluster name #### #### followed by a slash ("/") followed by the package name on #### #### the primary cluster. The recovery package name includes #### #### the recovery cluster name, followed by a slash ("/") #### #### followed by the package name on the recovery cluster. #### #### The data receiver package name includes the recovery cluster #### #### name, followed by a slash ("/") followed by the name of #### #### the data receiver package on the recovery cluster. #### #### #### #### Up to 29 recovery groups can be entered. #### #### #### #### Example: #### #### #### #### RECOVERY_GROUP_NAME nfsgroup #### #### PRIMARY_PACKAGE westcoast/nfspkg #### #### DATA_SENDER_PACKAGE westcoast/nfssenderpkg #### #### RECOVERY_PACKAGE eastcoast/nfsbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreceiverpkg #### #### #### #### RECOVERY_GROUP_NAME hpgroup #### #### PRIMARY_PACKAGE westcoast/hppkg #### #### DATA_SENDER_PACKAGE westcoast/hpsenderpkg #### #### RECOVERY_PACKAGE eastcoast/hpbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/hpreceiverpkg #### #### #### RECOVERY_GROUP_NAME PRIMARY_PACKAGE # DATA_SENDER_PACKAGE RECOVERY_PACKAGE # DATA_RECEIVER_PACKAGE
|
 |
Editing Section 3—Monitoring
DefinitionsFinally, you enter monitoring definitions, which define cluster
events and set times at which alert and alarm notifications are
to be sent out. Define notifications for all cluster events—Unreachable,
Down, Up, and Error. Although it is impossible to make specific recommendations
for every ContinentalClusters environment, here are a few general
guidelines about notifications. Specify the cluster event by using
the CLUSTER_EVENT keyword followed by the name of the cluster, a
slash ("/") and the name of the status—Unreachable,
Down, Up, or Error. Example: CLUSTER_EVENT LAcluster/UNREACHABLE Define a CLUSTER_ALERT at appropriate times following the appearance
of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about
the event. You can create as many alerts as needed, and you can
send as many notifications as you wish to different destinations
(see the comments in the file excerpt below for a list of destination
types). Note that the message text in the notification must be on
a separate line in the file. If the event is for a cluster in an Unreachable
condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time
since the appearance of the event (greater than the time used for
the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken.
You can create as many alarms as needed, and you can send as many
notifications as you wish to different destinations (see the comments
in the file excerpt below for a list of destination types). If you are using a monitor on a cluster containing
no recovery packages define alerts for the monitoring of Up, Down,
Unreachable, and Error states on the recovery cluster. It is not
necessary to define alarms.
A printout of Section 3 of the ContinentalClusters ASCII configuration file
follows.  |
############################################################################### #### #### #### Section 3. Monitoring Definitions #### #### #### #### This section of the file contains monitoring definitions. #### #### Well planned monitoring definitions will help in making the #### #### decision whether or not to issue the cmrecovercl(1m) command. #### #### Each monitoring definition specifies a cluster event along with #### #### the messages that should be sent to system administrators #### #### or other IT staff. All messages are appended to the default log #### #### /etc/cmcluster/cmconcl/eventlog as well as being sent to the #### #### destination you specify below. #### #### #### #### A cluster event takes place when a monitor that is located on #### #### one cluster detects a significant change in the condition #### #### of another cluster. The monitored cluster conditions are: #### #### #### #### UNREACHABLE - the cluster is unreachable. This will #### #### occur when the communication link to the #### #### cluster has gone down, as in a WAN failure, #### #### or when the all nodes in the cluster have #### #### failed. #### #### #### #### DOWN - the cluster is down but nodes are responding. #### #### This will occur when the cluster is halted, #### #### but some or all of the member nodes are booted #### #### and communicating with the monitoring cluster. #### #### #### #### UP - the cluster is up. #### #### #### #### ERROR - there is a mismatch of cluster versions or #### #### a security error. #### #### #### #### A change from one of these conditions to another one is a #### #### cluster event. You can define alert or alarm states based on the #### #### length of time since the cluster event was observed. Some events #### #### are noteworthy at the time they occur, and some are noteworthy #### #### when they persist over time. Setting the elapsed time to zero #### #### results in a message being sent as soon as the event takes place. #### #### Setting the elaspsed time to 5 minutes results in a message #### #### being sent when the condition has persisted for 5 minutes. #### #### #### #### An alert is intended as informational only. Alerts may be sent #### #### for any type of cluster condition. For an alert, a notification #### #### is sent to a system administrator or other destination. Alerts #### #### are not intended to indicate the need for recovery. The #### #### cmrecovercl(1m) command is disabled. #### #### #### #### An alarm is an indication that a condition exists that may #### #### require recovery. For an alarm, a notification is sent, and #### #### in addition, the cmrecovercl(1m) command is enabled for immediate #### #### execution, allowing the administrator to carry out cluster #### #### recovery. An alarm can only be defined for an UNREACHABLE or #### #### DOWN condition in the monitored cluster. #### #### #### #### A notification defines a message that is appended to the #### #### log file /etc/cmcluster/cmconcl/eventlog and sent to other #### #### specified destinations, including email addresses, SNMP traps, #### #### the system console, or the syslog file. The message string in #### #### a notification is entered in double quotes on a separate line; #### #### it can be no more than 170 characters long. Enter notifications #### #### in one of the following forms: #### #### #### #### NOTIFICATION CONSOLE #### #### <message> #### #### Message written to the console. #### #### #### #### NOTIFICATION EMAIL <address> #### #### <message> #### #### Message emailed to a fully #### #### qualified email address. #### #### #### #### NOTIFICATION OPC <level> #### #### <message> #### #### The message is sent to #### #### OpenView IT/Operations). #### #### The value of <level> may be 8 (normal), #### #### 16 (warning), 64 (minor), 128 (major), #### #### 32 (critical). #### #### #### #### NOTIFICATION SNMP <level> #### #### <message> #### #### The message is sent as an SNMP trap. #### #### The value of <level> may be 1 (normal), #### #### 2 (warning), 3 (minor), 4 (major), #### #### 5 (critical). #### #### #### #### NOTIFICATION SYSLOG #### #### <message> #### #### A notice of the event is appended to the #### #### syslog file. #### #### #### #### NOTIFICATION TCP <nodename>:<portnumber> #### #### <message> #### #### Message is sent to a TCP port on the #### #### specified node. #### #### #### #### NOTIFICATION TEXTLOG <pathname> #### #### <message> #### #### A notice of the event is written to a user- #### #### specified log file. <pathname> must be a full #### #### path for the user-specified file. #### #### #### #### NOTIFICATION UDP <nodename>:<portnumber> #### #### <message> #### #### Message is sent to a UDP port on the #### #### specified node. #### #### #### #### For the cluster event, enter a cluster name followed by #### #### a slash ("/") and a cluster condition (UP, DOWN, UNREACHABLE, #### #### ERROR) that may be detected by a monitor program. #### #### #### #### Each cluster event must be paired with a monitoring cluster. #### #### Include the name of the cluster on which the monitoring #### #### will take place. Events can be monitored from either the #### #### primary cluster or the recovery cluster. #### #### #### #### Alerts, alarms, and notifications have the following syntax. #### #### #### #### CLUSTER_ALERT <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alert notification about the #### #### cluster event. #### #### #### #### CLUSTER_ALARM <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alarm notification about the #### #### cluster event and enables the cmrecovercl(1m) #### #### command for immediate execution. #### #### #### #### NOTIFICATION <type> #### #### <message> #### #### A string value which is sent from the #### #### monitoring cluster for a given event #### #### to a specified destination. The <message>, #### #### which can be no more than 170 characters, #### #### is also appended to the #### #### /etc/cmcluster/cmconcl/eventlog #### #### file on the monitoring node in the cluster #### #### where the event was detected. #### #### #### #### Example: #### #### #### #### CLUSTER_EVENT westcoast/UNREACHABLE #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 5 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast unreachable for 5 min. Call secondary site." #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." #### #### #### #### CLUSTER_ALERT 10 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast unreachable for 10 min. Call secondary site." #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." #### #### NOTIFICATION CONSOLE #### #### "Cluster ALERT: westcoast not responding." #### #### #### #### CLUSTER_ALARM 15 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast unreachable for 15 min. Takeover advised." #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "westcoast still not responding. Use cmrecovercl command." #### #### NOTIFICATION CONSOLE #### #### "Cluster ALARM: Issue cmrecovercl command to take over westcoast." #### #### #### #### CLUSTER_EVENT westcoast/UP #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is up." #### #### #### #### CLUSTER_EVENT westcoast/DOWN #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is down." #### #### #### #### CLUSTER_EVENT westcoast/ERROR #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Error in monitoring cluster westcoast." #### #### #### CLUSTER_EVENT <cluster_name>/UNREACHABLE MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALARM NOTIFICATION NOTIFICATION CLUSTER_EVENT <cluster_name>/DOWN MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALARM NOTIFICATION NOTIFICATION CLUSTER_EVENT <cluster_name>/UP MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION CLUSTER_EVENT <cluster_name>/ERROR MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION
|
 |
Selecting Notification IntervalsThe monitor interval determines the amount of time between
distinct attempts by the monitor to obtain the status of a cluster.
The intervals associated with notifications need to be chosen to
work in combination with the monitor interval to give a realistic
picture of cluster events. Some combinations are not useful. For
example, notification intervals that are smaller than the monitor
interval do not make sense, and should be avoided. In the following
example, the cluster event will always result
in two alerts followed by an alarm. No change of state could possibly
be detected at the one-minute, two-minute and three-minute intervals,
because the monitor does not check for changes until the monitor
interval ( 5 minutes) has been reached.  |
MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 5 MINUTES ... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 1 MINUTE NOTIFICATION CONSOLE "1 Minute Alert: LACluster Unreachable" CLUSTER_ALERT 2 MINUTES NOTIFICATION CONSOLE "2 Minute Alert: LACluster Still Unreachable" CLUSTER_ALARM 3 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 3 Minutes: Recovery Enabled"
|
The following sequence could provide
meaningful notifications, since a change of state is possible between
notification intervals: MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 1 MINUTE ... CLUSTER_EVENT LACluster/UNREACHABLE CLUSTER_ALERT 3 MINUTES NOTIFICATION CONSOLE "3 Minute Alert: LACluster Unreachable" CLUSTER_ALERT 5 MINUTES NOTIFICATION CONSOLE "5 Minute Alert: LACluster Still Unreachable" CLUSTER_ALARM 10 MINUTES NOTIFICATION CONSOLE "ALARM: LACluster Unreachable after 10 Minutes: Recovery Enabled"
|
A rule of thumb is that the notification intervals should
be multiples of the monitor interval. Checking and Applying the ContinentalClusters Configuration |  |
After editing the configuration file on the primary cluster,
halt any monitor packages that are running, then use the following
steps to apply the configuration to all nodes in the continental
cluster. Use the following command to verify
the content of the file: # cmcheckconcl -v -C cmconcl.config This command will verify that all parameters are within range,
all fields are filled out, and the entries (such as NODE_NAME) are valid. Use the following command to distribute the ContinentalClusters configuration
information to all nodes in the continental cluster: # cmapplyconcl -v -C cmconcl.config Configuration data is copied to all nodes and in both clusters.
This data includes a set of managed object files that are copied
to the /var/adm/cmconcl/instances directory on every node in both clusters. All
nodes must be booted when the command is issued, although the MC/ServiceGuard
cluster may or may not be running. Be sure the make a backup copy of the configuration
ascii file after it is applied.
When configuration is finished, your systems should have sets
of files similar to those shown in Figure 5-8 “ContinentalClusters Configuration Files”.
Starting the ContinentalClusters Monitor Package |  |
Starting the monitoring package enables all ContinentalClusters functionality.
Before you do this, ensure that the primary packages you wish to
protect are running normally and that data sender and receiver packages,
if they are being used for logical data replication, are working correctly. If you are using physical data replication, make sure that
it is operational. On each monitoring cluster use the following command to start
the monitor package: # cmmodpkg -e ccmonpkg Validating the Configuration |  |
The following table shows the status of ContinentalClusters
packages when each cluster is running normally and no recovery has
taken place. Table 5-5 Status of ContinentalClusters Packages Before Recovery | Primary Cluster | Recovery Cluster |
|---|
Data Replication Method | Primary Package | Data Sender Package | Optional Monitor Package | Recovery Package | Data Receiver Package | Required Monitor Package |
|---|
Physical— Symmetrix | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Physical— XP Series | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Logical— Oracle Standby Database | Running | Not used | Running (optional) | Halted | Running | Running (required) |
Use the following steps to make sure the components are functioning correctly: Use the following command to
make sure all daemons are running: # ps -ef | grep cmcl Two important ContinentalClusters daemons are cmclsentryd and cmclrmond. Check the cluster configuration on each cluster
using the cmviewcl -v command. Ensure that
each primary package is running correctly. Ensure that data sender packages (if any are used
for logical data replication) are running correctly. Ensure that data receiver packages (if any are used
for logical data replication) are running correctly. Ensure that the continental cluster monitor package
is running correctly on each monitoring cluster.
On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors. On nodes where packages are running, check all package
log files for errors, including application packages and the monitor
package. Use the following command to verify the correct
operation of the ContinentalClusters daemon: # /opt/cmom/tools/bin/cmreadlog -f /var/adm/cmconcl/sentryd.log Make sure the ContinentalClusters monitor packages (default
name ccmonpkg) on each cluster fails over properly if a node fails. Change each cluster's state to test that
the monitor running on the monitoring cluster will detect the change
in status and send notification. View the status of the ContinentalCluster primary
and recovery clusters, including configured event data:
 |  |  |  |  | CAUTION: You should never issue the cmrunpkg command for a recovery package when ContinentalClusters is enabled,
because there is no guaranteed way of preventing a package that
is running on one cluster from running on the other cluster if
the package is started using this command. The potential for data
corruption is great. |  |  |  |  |
Chapters 6 and 7 contain additional suggestions on testing
the data replication and package configuration. Documenting the Recovery Procedure |  |
Once everything is configured and the ContinentalClusters monitor is running,
you must define your recovery procedure and train the administrators
and operators at both sites. The checklist in Figure 5-9 “Recovery Checklist” is an
example of how you might document the recovery procedure. Reviewing the Recovery Procedure |  |
Using the checklist described in the previous section, step
through the recovery procedure to make sure that all necessary steps
are included. If possible, create simulated failures to test the
alert and alarm scenarios coded in the ContinentalClusters configuration
file.
|