Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing Serviceguard Twelfth Edition > Chapter 5 Building an HA Cluster Configuration

Configuring the Cluster

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

This section describes how to define the basic cluster configuration. To do this in Serviceguard Manager, the graphical user interface, read the next section. If you want to use Serviceguard commands, skip ahead to the section entitled “Using Serviceguard Commands to Configure the Cluster.”

Using Serviceguard Manager to Configure the Cluster

Create a session on Serviceguard Manager. Select the option for discovering unused nodes. On the map or tree, from the list of unused nodes, select the one where you want to start the cluster. From the Actions menu, choose Configuring.

After you give the node’s root password, the Configuration screen will open, and you will be guided through the process. Each tab contains related information. Serviceguard Manager discovers much of the information, so you can choose from available options, such as lists of volume groups, networks, and nodes.

There is online Help available at each step to help you make decisions.

Configure your volume groups before configuring the cluster. If you are using a quorum server as the cluster lock, have it running before configuring the cluster.

When you complete your information, click Apply. If there are errors, they are displayed in a log window. If not, the log displays a “successful” message, and the binary configuration is automatically distributed to the nodes.

After a Refresh, the new cluster configuration and status information appears in the tree, map and Properties.

To modify or delete the configuration, select the cluster on the tree or map, and choose Configuring from the Actions menu.

Using Serviceguard Commands to Configure the Cluster

Use the cmquerycl command to specify a set of nodes to be included in the cluster and to generate a template for the cluster configuration file. Node names must be 31 bytes or less. Here is an example of the command:

cmquerycl -v -C /etc/cmcluster/clust1.config -n ftsys9 -n ftsys10 

The example creates an ASCII template file in the default cluster configuration directory, /etc/cmcluster. The ASCII file is partially filled in with the names and characteristics of cluster components on the two nodes  ftsys9 and ftsys10. Do not include the domain name when specifying the node name; for example, specify ftsys9 and not ftsys9.cup.hp.com. Edit the filled-in cluster characteristics as needed to define the desired cluster. It is strongly recommended that you edit the file to send heartbeat over all possible networks, as shown in the following example.

NOTE: In a larger or more complex configuration with many nodes, networks or disks connected to the cluster, the cmquerycl command may require several minutes to complete. In order to speed up the configuration process, you can direct the command to return selected information only by using the -k and -w options:

-k eliminates some disk probing, and does not return information about potential cluster lock volume groups and lock physical volumes.

-w local lets you specify local network probing, in which LAN connectivity is verified between interfaces within each node only.

-w full lets you specify full network probing, in which actual connectivity is verified among all LAN interfaces on all nodes in the cluster. This is the default.

-w none skips network querying. If you have recently checked the networks. this option will save time.

For complete details, refer to the man page on cmquerycl(1m).

Cluster Configuration Template File

The following is an example of an ASCII configuration file generated with the cmquerycl command using the -w full option:

# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
# ***** For complete details about cluster parameters and how to *******
# ***** set them, consult the Serviceguard manual. *********************
# **********************************************************************
# Enter a name for this cluster. This name will be used to identify the
# cluster when viewing or manipulating it.

CLUSTER_NAME cluster1

# Cluster Lock Parameters
# The cluster lock is used as a tie-breaker for situations
# in which a running cluster fails, and then two equal-sized
# sub-clusters are both trying to form a new cluster. The
# cluster lock may be configured using either a lock disk
# or a quorum server.
#
# You can use either the quorum server or the lock disk as
# a cluster lock but not both in the same cluster.
#
# Consider the following when configuring a cluster.
# For a two-node cluster, you must use a cluster lock. For
# a cluster of three or four nodes, a cluster lock is strongly
# recommended. For a cluster of more than four nodes, a
# cluster lock is recommended. If you decide to configure
# a lock for a cluster of more than four nodes, it must be
# a quorum server.

# Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and
# FIRST_CLUSTER_LOCK_PV parameters to define a lock disk.
# The FIRST_CLUSTER_LOCK_VG is the LVM volume group that
# holds the cluster lock. This volume group should not be
# used by any other cluster as a cluster lock device.

# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,
# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.
# The QS_HOST is the host name or IP address of the system
# that is running the quorum server process. The
# QS_POLLING_INTERVAL (microseconds) is the interval at which
# Serviceguard checks to make sure the quorum server is running.
# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase
# the time interval after which the quorum server is marked DOWN.
#
# The default quorum server timeout is calculated from the
# Serviceguard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# The value of QS_TIMEOUT_EXTENSION will directly effect the amount
# of time it takes for cluster reformation in the event of failure.
# For example, if QS_TIMEOUT_EXTENSION is set to 10 seconds, the cluster
# reformation will take 10 seconds longer than if the QS_TIMEOUT_EXTENSION
# was set to 0. This delay applies even if there is no delay in
# contacting the Quorum Server. The recommended value for
# QS_TIMEOUT_EXTENSION is 0, which is used as the default
# and the maximum supported value is 30000000 (5 minutes).
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000

QS_HOST sysman5
QS_POLLING_INTERVAL 300000000
# Definition of nodes in the cluster.
# Repeat node definitions as necessary for additional nodes.
# NODE_NAME is the specified nodename in the cluster.
# It must match the hostname and both cannot contain full domain name.
# Each NETWORK_INTERFACE, if configured with IPv4 address,
# must have ONLY one IPv4 address entry with it which could
# be either HEARTBEAT_IP or STATIONARY_IP.
# Each NETWORK_INTERFACE, if configured with IPv6 address(es)
# can have multiple IPv6 address entries(up to a maximum of 2,
# only one IPv6 address entry belonging to site-local scope
# and only one belonging to global scope) which must be all
# STATIONARY_IP. They cannot be HEARTBEAT_IP.

NODE_NAME fresno
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.168.91
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Warning: There are no standby network interfaces for lan0.

NODE_NAME lodi
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.168.94
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Warning: There are no standby network interfaces for lan0.

# Cluster Timing Parameters (microseconds).

# The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds).
# This default setting yields the fastest cluster reformations.
# However, the use of the default value increases the potential
# for spurious reformations due to momentary system hangs or
# network load spikes.
# For a significant portion of installations, a setting of
# 5000000 to 8000000 (5 to 8 seconds) is more appropriate.
# The maximum value recommended for NODE_TIMEOUT is 30000000
# (30 seconds).

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000

# The FAILOVER_OPTIMIZATION parameter enables Failover Optimization,
# which reduces the time Serviceguard takes for failover. (Failover
# Optimization cannot, however, change the time an application
# needs to shut down or restart.)
#
# There are four requirements:
# * The Serviceguard Extension for Faster Failover product
# (SGeFF) must be installed on all cluster nodes.
# * Only one or two node clusters are supported.
# * A quorum server must be configured as the tie-breaker.
# * The cluster must have more than one heartbeat subnet,
# and neither can be a serial line (RS232).
#
# Other considerations are listed in the SGeFF Release Notes
# and the Serviceguard manual.
#
# You must halt the cluster to change FAILOVER_OPTIMIZATION
# parameter.
#
# To enable Failover Optimization, set FAILOVER_OPTIMIZATION
# to TWO_NODE.
# The default is NONE.
#
# FAILOVER_OPTIMIZATION <NONE/TWO_NODE>

FAILOVER_OPTIMIZATION NONE

# Configuration/Reconfiguration Timing Parameters (microseconds).

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

# Network Monitor Configuration Parameters.
# The NETWORK_FAILURE_DETECTION parameter determines how LAN card failures are
# detected. If set to INONLY_OR_INOUT, a LAN card will be considered down
# when its inbound message count stops increasing or when both inbound and
# outbound message counts stop increasing.
# If set to INOUT, both the inbound and outbound message counts must
# stop increasing before the card is considered down.
NETWORK_FAILURE_DETECTION INOUT

# Package Configuration Parameters.
# Enter the maximum number of packages which will be configured in the cluster.
# You can not add packages beyond this limit.
# This parameter is required.
MAX_CONFIGURED_PACKAGES 150


# Access Control Policy Parameters.
#
# Three entries set the access control policy for the cluster:
# First line must be USER_NAME, second USER_HOST, and third USER_ROLE.
# Enter a value after each.
#
# 1. USER_NAME can either be ANY_USER, or a maximum of
# 8 login names from the /etc/passwd file on user host.
# 2. USER_HOST is where the user can issue Serviceguard commands.
# If using Serviceguard Manager, it is the COM server.
# Choose one of these three values: ANY_SERVICEGUARD_NODE, or
# (any) CLUSTER_MEMBER_NODE, or a specific node. For node,
# use the official hostname from domain name server, and not
# an IP addresses or fully qualified name.
# 3. USER_ROLE must be one of these three values:
# * MONITOR: read-only capabilities for the cluster and packages
# * PACKAGE_ADMIN: MONITOR, plus administrative commands for packages
# in the cluster
# * FULL_ADMIN: MONITOR and PACKAGE_ADMIN plus the administrative
# commands for the cluster.
#
# Access control policy does not set a role for configuration
# capability. To configure, a user must log on to one of the
# cluster’s nodes as root (UID=0). Access control
# policy cannot limit root users’ access.
#
# MONITOR and FULL_ADMIN can only be set in the cluster configuration file,
# and they apply to the entire cluster. PACKAGE_ADMIN can be set in the
# cluster or a package configuration file. If set in the cluster
# configuration file, PACKAGE_ADMIN applies to all configured packages.
# If set in a package configuration file, PACKAGE_ADMIN applies to that
# package only.
#
# Conflicting or redundant policies will cause an error while applying
# the configuration, and stop the process. The maximum number of access
# policies that can be configured in the cluster is 200.
#
#
# Example: to configure a role for user john from node noir to
# administer a cluster and all its packages, enter:
# USER_NAME john
# USER_HOST noir
# USER_ROLE FULL_ADMIN

USER_NAME root
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE full_admin


# List of cluster aware LVM Volume Groups. These volume groups will
# be used by package applications via the vgchange -a e command.
# Neither CVM or VxVM Disk Groups should be used here.
# For example:
# VOLUME_GROUP /dev/vgdatabase
# VOLUME_GROUP /dev/vg02


# List of OPS Volume Groups.
# Formerly known as DLM Volume Groups, these volume groups
# will be used by OPS or RAC cluster applications via
# the vgchange -a s command. (Note: the name DLM_VOLUME_GROUP
# is also still supported for compatibility with earlier versions.)
# For example:
# OPS_VOLUME_GROUP /dev/vgdatabase
# OPS_VOLUME_GROUP /dev/vg02


The man page for the cmquerycl command lists the definitions of all the parameters that appear in this file. Many are also described in the “Planning” chapter. Modify your /etc/cmcluster/clust1.config file to your requirements, using the data on the cluster worksheet.

In the file, keywords are separated from definitions by white space. Comments are permitted, and must be preceded by a pound sign (#) in the far left column. See the man page for the cmquerycl command for more details.

Specifying a Lock Disk

A cluster lock is required for two node clusters like the one in this example. The lock must be accessible to all nodes and must be powered separately from the nodes. Refer to the section “Cluster Lock” in Chapter 3 for additional information. Enter the lock disk information following the cluster name. The lock disk must be in an LVM volume group that is accessible to all the nodes in the cluster.

The default FIRST_CLUSTER_LOCK_VG and FIRST_CLUSTER_LOCK_PV supplied in the ASCII template created with cmquerycl are the volume group and physical volume name of a disk chosen based on minimum failover time calculations. You should ensure that this disk meets your power wiring requirements. If necessary, choose a disk powered by a circuit which powers fewer than half the nodes in the cluster.

To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster.The output of the command lists the disks connected to each node together with the re-formation time associated with each.

Do not include the node’s entire domain name; for example, specify ftsys9 not ftsys9.cup.hp.com:
cmquerycl -v -n ftsys9 -n ftsys10 

cmquerycl will not print out the reformation time for a volume group that currently belongs to a cluster. If you want cmquerycl to print the reformation time for a volume group, run vgchange -c n <vg name> to clear the cluster ID from the volume group. After you are done, do not forget to run vgchange -c y <vg name> to re-write the cluster ID back to the volume group.

NOTE: You should not configure a second lock volume group or physical volume unless your configuration specifically requires it. See the discussion “Dual Cluster Lock” in the section “Cluster Lock” in Chapter 3.

If your configuration requires you to configure a second cluster lock, enter the following parameters in the cluster configuration file:

SECOND_CLUSTER_LOCK_VG /dev/volume-group
SECOND_CLUSTER_LOCK_PV /dev/dsk/block-special-file

where the /dev/volume-group is the name of the second volume group and block-special-file is the physical volume name of a lock disk in the chosen volume group. These lines should be added for each node.

Specifying a Quorum Server

To specify a quorum server instead of a lock disk, use the -q option of the cmquerycl command, specifying a Quorum Server host server. Example:

# cmquerycl -n ftsys9 -n ftsys10 -q qshost

The cluster ASCII file that is generated in this case contains parameters for defining the quorum server. This portion of the file is shown below:

# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,
# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.
# The QS_HOST is the host name or IP address of the system
# that is running the quorum server process. The
# QS_POLLING_INTERVAL (microseconds) is the interval at which
# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase
# the time interval after which the quorum server is marked DOWN.
#
# The default quorum server interval is calculated from the
# Serviceguard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000

Enter the QS_HOST, QS_POLLING_INTERVAL and, if desired, a QS_TIMEOUT_EXTENSION.

Identifying Heartbeat Subnets

The cluster ASCII file includes entries for IP addresses on the heartbeat subnet. It is recommended that you use a dedicated heartbeat subnet, but it is possible to configure heartbeat on other subnets as well, including the data subnet.

The heartbeat must be on an IPv4 subnet and must employ IPv4 addresses. IPv6 heartbeat is not supported.

NOTE: If you are using Version 3.5 VERITAS CVM disk groups, you can configure only a single heartbeat subnet, which should be a dedicated subnet. Each system on this subnet must have standby LANs configured, to ensure that there is a highly available heartbeat path. (Version 4.1 configurations can have multiple heartbeats.)

Specifying Maximum Number of Configured Packages

This specifies the most packages that can be configured in the cluster.

The parameter value must be equal to or greater than the number of packages currently configured in the cluster. The count includes all types of packages: failover, multi-node, and system multi-node.

For Serviceguard A.11.17, the default is 150, which is the maximum allowable number of all packages per cluster.

NOTE: Remember to tune HP-UX kernel parameters on each node to ensure that they are set high enough for the largest number of packages that will ever run concurrently on that node.

Modifying Cluster Timing Parameters

The cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact the cluster’s reformation and failover times. It is useful to modify these parameters if the cluster is reforming occasionally due to heavy system load or heavy network traffic.

The default value of 2 seconds for NODE_TIMEOUT leads to a best case failover time of 30 seconds. If NODE_TIMEOUT is changed to 10 seconds, which means that the cluster manager waits 5 times longer to timeout a node, the failover time is increased by 5, to approximately 150 seconds. NODE_TIMEOUT must be at least 2*HEARTBEAT_INTERVAL. A good rule of thumb is to have at least two or three heartbeats within one NODE_TIMEOUT.

Identifying Serial Heartbeat Connections

If you are using a serial (RS232) line as a heartbeat connection, use the SERIAL_DEVICE_FILE parameter and enter the device file name that corresponds to the serial port you are using on each node. Be sure that the serial cable is securely attached during and after configuration.

Optimization

Serviceguard Extension for Faster Failover (SGeFF) is a separately purchased product. If it is installed, the configuration file will display the parameter to enable it.

SGeFF reduces the time it takes Serviceguard to process a failover. It cannot, however, change the time it takes for packages and applications to gracefully shut down and restart.

SGeFF has requirements for cluster configuration, as outlined in the cluster configuration template file.

For more information, see the Serviceguard Extension for Faster Failover Release Notes posted on http://www.docs.hp.com/hpux/ha.

Access Control Policies

Beginning with Serviceguard Version A.11.16, Access Control Policies allow non-root user to use common administrative commands.

Non-root users of Serviceguard Manager, the graphical user interface, need to have a configured access policy to view and to administer Serviceguard clusters, packages and packages. In new configurations, it is a good idea to immediately configure at least one monitor access policy.

Check spelling when entering text, especially when typing wildcards, such as ANY_USER and CLUSTER_MEMBER_NODE. If they are misspelled, Serviceguard will assume they are specific users or nodes. You may not configure the access policy you intended to configure.

A root user on the cluster can create or modify access policies while the cluster is running.

For more information, see “Access Roles” and “Editing Security Files ”.

Adding Volume Groups

Add any LVM volume groups you have configured to the ASCII cluster configuration file, with a separate VOLUME_GROUP parameter for each cluster-aware volume group that will be used in the cluster. These volume groups will be initialized with the cluster ID when the cmapplyconf command is used. In addition, you should add the appropriate volume group, logical volume and filesystem information to each package control script that activates a volume group. This process is described in Chapter 6.

NOTE: If you are using CVM disk groups, they should be configured after cluster configuration is done, using the procedures described in “Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)”. VERITAS disk groups are added to the package configuration file, as described in Chapter 6.

Verifying the Cluster Configuration

In Serviceguard Manager, click the Check button to verify the configuration.

If you have edited an ASCII cluster configuration file using the command line, use the following command to verify the content of the file:

cmcheckconf -k -v -C /etc/cmcluster/clust1.config 

Both methods check the following:

  • Network addresses and connections.

  • Cluster lock connectivity (if you are configuring a lock disk).

  • Validity of configuration parameters for the cluster and packages.

  • Uniqueness of names.

  • Existence and permission of scripts specified in the command line.

  • If all nodes specified are in the same heartbeat subnet.

  • If you specify the wrong configuration filename.

  • If all nodes can be accessed.

  • No more than one CLUSTER_NAME, HEARTBEAT_INTERVAL, and AUTO_START_TIMEOUT are specified.

  • The value for package run and halt script timeouts is less than 4294 seconds.

  • The value for NODE_TIMEOUT is at least twice the value of HEARTBEAT_INTERVAL.

  • The value for AUTO_START_TIMEOUT variables is >=0.

  • Heartbeat network minimum requirement. The cluster must have one heartbeat LAN configured with a standby, two heartbeat LANs, one heartbeat LAN and an RS232 connection, or one heartbeat network with no local LAN switch, but with a primary LAN that is configured as a link aggregate of at least two interfaces.

  • At least one NODE_NAME is specified.

  • Each node is connected to each heartbeat network.

  • All heartbeat networks are of the same type of LAN.

  • The network interface device files specified are valid LAN device files.

  • If a serial (RS-232) heartbeat is configured, there are no more than two nodes in the cluster, and no more than one serial (RS232) port connection per node.

  • VOLUME_GROUP entries are not currently marked as cluster-aware.

  • There is only one heartbeat subnet configured if you are using CVM 3.5 disk storage.

If the cluster is online, the check also verifies that all the conditions for the specific change in configuration have been met.

NOTE: Using the -k option means that cmcheckconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmcheckconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command.

Distributing the Binary Configuration File

After specifying all cluster parameters, you apply the configuration. This action distributes the binary configuration file to all the nodes in the cluster. We recommend doing this separately before you configure packages (described in the next chapter). In this way, you can verify the cluster lock, heartbeat networks, and other cluster-level operations by using the cmviewcl command on the running cluster. Before distributing the configuration, ensure that your security files permit copying among the cluster nodes. See “Preparing Your Systems” at the beginning of this chapter.

Distributing the Binary File with Serviceguard Manager

When you have finished entering the information, click Apply.

Distributing the Binary File on the Command Line

Use the following steps to generate the binary configuration file and distribute the configuration to all nodes in the cluster:

  • Activate the cluster lock volume group so that the lock disk can be initialized:

    vgchange -a y /dev/vglock  
  • Generate the binary configuration file and distribute it:

    cmapplyconf -k -v -C /etc/cmcluster/clust1.config   

    or

    # cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii
    NOTE: Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command.
  • Deactivate the cluster lock volume group.

    vgchange -a n /dev/vglock  

The cmapplyconf command creates a binary version of the cluster configuration file and distributes it to all nodes in the cluster. This action ensures that the contents of the file are consistent across all nodes. Note that the cmapplyconf command does not distribute the ASCII configuration file.

NOTE: The apply will not complete unless the cluster lock volume group is activated on exactly one node before applying. There is one exception to this rule: a cluster lock had been previously configured on the same physical volume and volume group.

After the configuration is applied, the cluster lock volume group must be deactivated.

Storing Volume Group and Cluster Lock Configuration Data

After configuring the cluster, create a backup copy of the LVM volume group configuration by using the vgcfgbackup command for each volume group you have created. If a disk in a volume group must be replaced, you can then restore the disk's metadata by using the vgcfgrestore command. The procedure is described under “Replacing Disks” in the “Troubleshooting” chapter.

Be sure to use vgcfgbackup for all volume groups, especially the cluster lock volume group.

NOTE: You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using SAM or using HP-UX commands.

If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk. Failure to do this might result in a failure of the entire cluster if all redundant copies of the lock disk have failed and if replacement mechanisms or LUNs have not had the lock configuration restored. (If the cluster lock disk is configured in a disk array, RAID protection provides a redundant copy of the cluster lock data. Mirrordisk/UX does not mirror cluster lock information.)

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.