Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Configuring OPS Clusters with ServiceGuard OPS Edition > Chapter 5 Building an OPS Cluster Configuration

Configuring the Cluster

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

This section describes how to define the basic cluster configuration. To do this in SAM, read the next section. If you want to use ServiceGuard commands, skip ahead to the section "“Using ServiceGuard Commands to Configure the Cluster”."

Using SAM to Configure the Cluster

To configure a high availability cluster, use the following steps on the configuration node (ftsys9):

  1. In SAM, select Clusters->High Availability Clusters.

  2. Choose the Cluster Configuration option. SAM displays a Cluster Configuration screen. If no clusters have yet been configured, the list are is empty. If there are one or more HA clusters already configured on your local network, you will see them listed.

  3. Select the Actions menu, and choose Create Cluster Configuration. A Step menu displays.

  4. Choose each required step in sequence, filling in the dialog boxes with required information, or accepting the default values shown. For information about each step, choose Help.

  5. When finished with all steps, select at the Step Menu screen. This action creates the cluster configuration file and then copies the file to all the nodes in the cluster. When the file copying is finished, you return to the Cluster Configuration screen.

  6. Exit from the Cluster Configuration screen, returning to the High Availability Clusters menu.

NOTE: In addition to creating and distributing a binary cluster configuration file, SAM creates an ASCII cluster configuration file, named /etc/cmcluster/cmcl.config. This file is available as a record of the choices entered in SAM.

Skip ahead to the section "“Installing Oracle Parallel Server ”."

Using ServiceGuard Commands to Configure the Cluster

Use the cmquerycl command to specify a set of nodes to be included in the cluster and to generate a template for the cluster configuration file. Here is an example of the command as issued from node ftsys9:

# cmquerycl -v -C /etc/cmcluster/cmcl.config \
-n ftsys9 -n ftsys10

The example creates an ASCII template file in the default cluster configuration directory, /etc/cmcluster. The ASCII file is partially filled in with the names and characteristics of cluster components on the two nodes ftsys9 and ftsys10. Edit the filled-in cluster characteristics as needed to define the desired cluster. It is strongly recommended that you edit the file to send heartbeat over all possible networks.

NOTE: In a larger or more complex configuration with many nodes, networks or disks connected to the cluster, the cmquerycl command may require several minutes to complete. In order to speed up the configuration process, you can direct the command to return selected information only by using the -k and -w options:

-k eliminates some disk probing, and does not return information about potential cluster lock volume groups and lock physical volumes.

-w local lets you specify local network probing, in which LAN connectivity is verified between interfaces within each node only.

-w full lets you specify full network probing, in which actual connectivity is verified among all LAN interfaces on all nodes in the cluster.

For complete details, refer to the man page on cmquerycl(1m).

Cluster Configuration Template File

The following is an example of an ASCII configuration file generated with the cmquerycl command using the -w full option.

# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
# ***** For complete details about cluster parameters and how to ****
# ***** set them, consult the cmquerycl(1m) manpage or your manual. ****
# **********************************************************************

# Enter a name for this cluster. This name will be used to identify the
# cluster when viewing or manipulating it.

CLUSTER_NAME lpcluster

# Cluster Lock Device Parameters. This is the volume group that
# holds the cluster lock which is used to break a cluster formation
# tie. This volume group should not be used by any other cluster
# as cluster lock device.

FIRST_CLUSTER_LOCK_VG /dev/vg01

# Definition of nodes in the cluster.
# Repeat node definitions as necessary for additional nodes.

NODE_NAME ftsys9
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.171.32
NETWORK_INTERFACE lan3
HEARTBEAT_IP 192.6.7.3
NETWORK_INTERFACE lan4
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.6.143.10
FIRST_CLUSTER_LOCK_PV /dev/dsk/c1t2d0

# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Primary Network Interfaces on Bridged Net 1: lan0.
# Warning: There are no standby network interfaces on bridged net 1.
# Primary Network Interfaces on Bridged Net 2: lan3.
# Possible standby Network Interfaces on Bridged Net 2: lan4.
# Primary Network Interfaces on Bridged Net 3: lan1.
# Warning: There are no standby network interfaces on bridged net 3.

# Cluster Timing Parameters (microseconds).

# The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds).
# This default setting yields the fastest cluster reformations.
# However, the use of the default value increases the potential
# for spurious reformations due to momentary system hangs or
# network load spikes.
# For a significant portion of installations, a setting of
# 5000000 to 8000000 (5 to 8 seconds) is more appropriate.
# The maximum recommended value for NODE_TIMEOUT is 3000000
# (30 seconds).


HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000

# Configuration/Reconfiguration Timing Parameters (microseconds).

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

# Package Configuration Parameters.
# Enter the maximum number of packages which will be configured in the cluster.
# You can not add packages beyond this limit.
# This parameter is required.

MAX_CONFIGURED_PACKAGES 10

# List of cluster aware Volume Groups. These volume groups
# will be used by clustered applications via the vgchange -a e command.
# For example:
# VOLUME_GROUP /dev/vgdatabase
# VOLUME_GROUP /dev/vg02

VOLUME_GROUP /dev/vg01
VOLUME_GROUP /dev/vg02

# List of OPS Volume Groups.# Formerly known as DLM Volume Groups, these volume groups
# will be used by OPS cluster applications via
# the vgchange -a s command. (Note: the name DLM_VOLUME_GROUP
# is also still supported for compatibility with earlier versions.)
# For example:
# OPS_VOLUME GROUP /dev/vgdatabase.
# OPS_VOLUME_GROUP /dev/vg02.

OPS_VOLUME_GROUP /dev/vg_ops

The man page for the cmquerycl command lists the definitions of all the parameters that appear in this file. Many are also described in the chapter "Chapter 4 “Planning and Documenting an OPS Cluster”." Modify the /etc/cmcluster/cmcl.config file to your requirements, using the data on the cluster configuration worksheet.

In the file, keywords are separated from definitions by white space. Comments are permitted, and must be preceded by a pound sign (#) in the far left column. See the man page for the cmquerycl command for more details.

Identifying Non-OPS Volume Groups for Packages

The file includes entries for all package volume groups that are to be defined as cluster-aware, that is, those which can be accessed by packages running on different nodes in the cluster at different times. A separate VOLUME_GROUP line should appear for each volume group that will be activated by any package running in the cluster. To leave a volume group unmarked, remove the volume group name from the ASCII file.

NOTE: If a volume group is not cluster-aware, then it cannot be activated by a package control script.

Identifying LVM Volume Groups to be Used by OPS

The template file also includes entries for all LVM volume groups used by the Oracle Parallel Server that are accessed concurrently by the different nodes in the cluster. These volume groups are activated by the vgchange -a s command in the control script that activates each OPS instance. A separate OPS_VOLUME_GROUP line should appear for each LVM volume group that will be activated in shared mode. Volume groups that will be used by Oracle Parallel Server must be labeled OPS_VOLUME_GROUP.

NOTE: It is important that only LVM volume groups used by OPS be listed with the OPS_VOLUME_GROUP parameter, since these volume groups will be marked for activation in shared mode. LVM volume groups used by other packages should be listed with the VOLUME_GROUP parameter described above, and CVM disk groups should be identified in STORAGE_GROUP entries in the package configuration ASCII file. You may need to change the default assignments in order to get this correct.

Redeploying Previously Configured Volume Groups

In configuring a new cluster, if you are using volume groups that were used in a previous cluster configuration, you should ensure that they are not currently cluster aware (marked with a cluster id). You can use the following command to remove the cluster id if necessary:

# vgchange -c n

Identifying Heartbeat Subnets

The cluster ASCII file includes entries for IP addresses on the heartbeat subnet. It is recommended that you use a dedicated heartbeat subnet, but it is possible to configure heartbeat on other subnets as well, including the data subnet.

NOTE: If you are using VERITAS CVM disk groups, you can configure only a single heartbeat subnet, which should be a dedicated subnet. Each system on this subnet must have standby LANs configured, to ensure that there is a highly available heartbeat path.

Specifying a Lock Disk

A cluster lock is required for two node clusters like the one in this example. The lock must be accessible to all nodes and must be powered separately from the nodes. Enter the lock disk information in the cluster ASCII file following the cluster name. The lock disk must be in an LVM volume group that is accessible to all the nodes in the cluster.

The default FIRST_CLUSTER_LOCK_VG and FIRST_CLUSTER_LOCK_PV supplied in the ASCII template created with cmquerycl are the volume group and physical volume name of a disk chosen based on minimum failover time calculations. You should ensure that this disk meets your power wiring requirements. If necessary, choose a disk powered by a circuit which powers fewer than half the nodes in the cluster.

To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster:

# cmquerycl -v -n ftsys9 -n ftsys10 

The output of the command lists the disks connected to each node together with the re-formation time associated with each.

NOTE: You should not configure a second lock volume group or physical volume unless your configuration specifically requires it. See the discussion "Dual Cluster Lock" in the section "Cluster Lock" in Chapter 3.

If your configuration requires you to configure a second cluster lock, enter the following parameters in the cluster configuration file:

SECOND_CLUSTER_LOCK_VG /dev/volume-group
SECOND_CLUSTER_LOCK_PV /dev/dsk/block-special-file

where the /dev/volume-group is the name of the second volume group and block-special-file is the physical volume name of a lock disk in the chosen volume group. These lines should be added for each node.

Specifying a Quorum Server

To specify a quorum server instead of a lock disk, use the -q option of the cmquerycl command, specifying a Quorum Server host server. Example:

# cmquerycl -n node1 -n node2 -q lp-qs

The cluster ASCII file that is generated in this case contains parameters for defining the quorum server. This portion of the file is shown below:

# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,
# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.
# The QS_HOST is the host name or IP address of the system
# that is running the quorum server process. The
# QS_POLLING_INTERVAL (microseconds) is the interval at which
# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase
# the time interval after which the quorum server is marked DOWN.
#
# The default quorum server interval is calculated from the
# ServiceGuard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000

Enter the QS_HOST, QS_POLLING_INTERVAL and, if desired, a QS_TIMEOUT_EXTENSION.

Specifying Maximum Number of Configured Packages

ServiceGuard OPS Edition preallocates memory and threads at cluster startup time. It calculates these values based on the number of packages specified in the MAX_CONFIGURED_PACKAGES parameter in the cluster configuration file. This value must be equal to or greater than the number of packages currently configured in the cluster. The default is 0, which means that you must enter a value if you wish to use packages. The absolute maximum number of packages per cluster is 60. ServiceGuard reserves approximately 6MB plus about 80KB of memory for each package. When selecting a value for MAX_CONFIGURED_PACKAGES, be sure to include the CVM-VxVM-PKG as part of the total in MAX_CONFIGURED_PACKAGES if you will be using VERITAS CVM disk storage.

NOTE: Remember to tune HP-UX kernel parameters on each node to ensure that they are set high enough for the largest number of packages that will ever run concurrently on that node.

Modifying Cluster Timing Parameters

The cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact the cluster's reformation and failover times. It is useful to modify these parameters if the cluster is reforming occasionally due to heavy system load or heavy network traffic.

The default value of 2 seconds for NODE_TIMEOUT leads to a best case failover time of 30 seconds. If NODE_TIMEOUT is changed to 10 seconds, which means that the cluster manager waits 5 times longer to timeout a node, the failover time is increased by 5, to approximately 150 seconds. NODE_TIMEOUT must be at least 2*HEARTBEAT_INTERVAL. A good rule of thumb is to have at least two or three heartbeats within one NODE_TIMEOUT.

Identifying Serial Heartbeat Connections

If you are using a serial (RS232) line as a heartbeat connection, use the SERIAL_DEVICE_FILE parameter and enter the device file name that corresponds to the serial port you are using on each node. Be sure that the serial cable is securely attached during and after configuration.

Verifying the Cluster Configuration

SAM automatically checks the configuration you enter and reports any errors. If you have edited an ASCII cluster configuration file, use the following command to verify the content of the file:

# cmcheckconf -v -C /etc/cmcluster/cmclconf.config 

This command or automatic verification in SAM both check the following:

  • Network addresses and connections.

  • Cluster lock connectivity.

  • Validity of configuration parameters for the cluster and packages.

  • Uniqueness of names.

  • Existence and permission of scripts specified in the command line.

  • If all nodes specified are in the same heartbeat subnet.

  • If you specify the wrong configuration filename.

  • If all nodes can be accessed.

  • No more than one CLUSTER_NAME, HEARTBEAT_INTERVAL, and AUTO_START_TIMEOUT are specified.

  • The value for package run and halt script timeouts is less than 4294 seconds.

  • The value for HEARTBEAT_INTERVAL is at least one second.

  • The value for NODE_TIMEOUT is at least twice the value of HEARTBEAT_INTERVAL.

  • The value for AUTO_START_TIMEOUT variables is >=0.

  • Heartbeat network minimum requirement. The cluster must have one heartbeat LAN configured with a standby, two heartbeat LANs, one heartbeat LAN and an RS232 connection, or one heartbeat network with no local LAN switch, but with a primary LAN that is configured as a link aggregate of at least two interfaces.

  • There is only one heartbeat subnet configured if you are using CVM disk storage.

  • At least one NODE_NAME is specified.

  • Each node is connected to each heartbeat network.

  • All heartbeat networks are of the same type of LAN.

  • The network interface device files specified are valid LAN device files.

  • If RS-232 is used, that RS232 is configured on a two node cluster, and there is no more than one serial (RS232) port connection per node.

  • VOLUME_GROUP and OPS_VOLUME_GROUP entries are not currently marked as cluster-aware.

If the cluster is online, SAM (or the cmcheckconf command) also verifies that all the conditions for the specific change in configuration have been met.

Distributing the Binary Configuration File

After specifying all cluster parameters, you use SAM or HP-UX commands to apply the configuration. This action distributes the binary configuration file to all the nodes in the cluster. We recommend doing this separately before you configure packages (described in the next chapter, "Chapter 6 “Configuring Packages and Their Services”"). In this way, you can verify the cluster lock, heartbeat networks, and other cluster-level operations by using the cmviewcl command on the running cluster. Before distributing the configuration, ensure that your security files permit copying among the cluster nodes. See the section "“Preparing Your Systems ”" at the beginning of this chapter.

Distributing the Configuration File with SAM

When you have finished entering parameters in the Cluster Configuration subarea in SAM, you are asked to verify the copying of the files to all the nodes in the cluster. When you respond OK to the verification prompt, ServiceGuard copies the binary configuration file and the ASCII configuration file to all the nodes in the cluster.

Distributing the Configuration File with HP-UX Commands

Use the following steps to generate the binary configuration file and distribute the configuration to all nodes in the cluster:

  • Activate the cluster lock volume group so that the lock disk can be initialized:

    # vgchange -a y /dev/vglock  
  • Generate the binary configuration file and distribute it across the nodes.

    # cmapplyconf -v -C /etc/cmcluster/cmclconf.config 
  • Deactivate the cluster lock volume group.

    # vgchange -a n /dev/vglock  

The cmapplyconf command creates a binary version of the cluster configuration file and distributes it to all nodes in the cluster. This action ensures that the contents of the file are consistent across all nodes. Note that the cmapplyconf command does not distribute the ASCII configuration file.

CAUTION: The cluster lock volume group must be activated only on the node from which you issue the cmapplyconf command, so that the lock disk can be initialized. If you attempt to configure a cluster either using SAM or by issuing the cmapplyconf command on one node while the lock volume group is active on another, different node, the cluster lock will be left in an unknown state. Therefore, you must ensure that when you configure the cluster, the cluster lock volume group is active only on the configuration node and deactivated on all other nodes.

Be sure to deactivate the cluster lock volume group on the configuration node after cmapplyconf is executed.

Storing Volume Group and Cluster Lock Configuration Data

After creating the cluster configuration with SAM or with HP-UX commands, make a backup copy of the volume group configuration data by using the vgcfgbackup command for each cluster aware volume group you have created. If a disk in a volume group must be replaced, you can then restore the disk's metadata by using the vgcfgrestore command. The procedure is described in the section "“Replacing Disks”" in the chapter "Chapter 8 “Troubleshooting Your Cluster”."

Be sure to use vgcfgbackup for all cluster and OPS volume groups, including the cluster lock volume group, when you first create the cluster. Use the command again for each new cluster aware volume group that you add with SAM or with HP-UX commands.

NOTE: If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk. Failure to do this might result in a failure of the entire cluster if all redundant copies of the lock disk have failed and if replacement mechanisms or LUNs have not had the lock configuration restored. (If the cluster lock disk is configured in a disk array, RAID protection provides a redundant copy of the cluster lock data. MirrorDisk/UX does not mirror cluster lock information.)

Checking Cluster Operation with ServiceGuard Manager

ServiceGuard Manager lets you see all the nodes and packages within a cluster and displays their current status. Refer to the section on "Using ServiceGuard Manager" in Chapter 7. The following are suggested using ServiceGuard Manager:

  • Ensure that all configured nodes are running.

  • Check that all configured packages are running, and running on the correct nodes.

  • Ensure that the settings on the property sheets for cluster, nodes, and packages are correct.

When you are sure the cluster is correctly configured, save a copy of the configuration data in a file for archival purposes. The data in this file can be compared with later versions of the cluster to understand the changes that are made over time.

Checking Cluster Operation with ServiceGuard Commands

ServiceGuard also provides several commands for manual control of the cluster:

  • cmrunnode is used to start a node.

  • cmhaltnode is used to manually stop a running node. (This command is also used by shutdown(1m).)

  • cmruncl is used to manually start a stopped cluster.

  • cmhaltcl is used to manually stop a cluster.

You can use these commands to test cluster operation, as in the following:

  1. If the cluster is not already online, run the cluster, as follows:

    # cmruncl -v  
  2. When the cluster has started, use the following command to ensure that cluster components are operating correctly:

    # cmviewcl -v  

    Make sure that all nodes and networks are functioning as expected. For information about using cmviewcl, refer to the chapter "Cluster and Package Maintenance."

  3. Use the following sequence of commands to verify that nodes leave and enter the cluster as expected:

    • On a cluster node, issue the cmhaltnode command.

    • Use the cmviewcl command to verify that the node has left the cluster.

    • Issue the cmrunnode command.

    • Use the cmviewcl command again to verify that the node has returned to operation.

  4. Use the following command to bring down the cluster:

    # cmhaltcl -v -f  

Additional cluster testing is described in the chapter "Troubleshooting Your Cluster." Refer to the appendix "ServiceGuard OPS Edition Commands" for a complete list of ServiceGuard commands.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.