Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing Serviceguard Version A.11.16, Eleventh EditionSecond Printing > Chapter 5  Building an HA Cluster Configuration

Preparing Your Systems

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration and NTP (network time protocol) configuration.

Understanding Where Files Are Located

Serviceguard uses a special file, /etc/cmcluster.conf, to define the locations for configuration and log files within the HP-UX filesystem. The following locations are defined in the file:

###################### cmcluster.conf########################
#
# Highly Available Cluster file locations
#
# This file must not be edited
#############################################################

SGCONF=/etc/cmcluster
SGSBIN=/usr/sbin
SGLBIN=/usr/lbin
SGLIB=/usr/lib
SGCMOM=/opt/cmom
SGRUN=/var/adm/cmcluster
SGAUTOSTART=/etc/rc.config.d/cmcluster
SGCMOMLOG=/var/adm/syslog/cmom
NOTE: If these variables are not defined on your system, then include the file /etc/cmcluster.conf in your login profile for user root.

Throughout this book, system filenames are usually given with one of these location prefixes. Thus, references to $SGCONF/<FileName> can be resolved by supplying the definition of the prefix that is found in this file. For example, if SGCONF is defined as /etc/cmcluster/conf, then the complete pathname for file $SGCONF/cmclconfig would be /etc/cmcluster/conf/cmclconfig.

NOTE: Do not edit the /etc/cmcluster.conf configuration file.

Editing Security Files

Serviceguard daemons grant access to commands by matching incoming hostname and username against defined access control policies. To understand how to properly configure these policies, administrators need to understand how Serviceguard handles hostnames, IP addresses, usernames and the relevant configuration files.

For redundancy, Serviceguard utilizes all available IPv4 networks for communication. If a Serviceguard node is able to communicate with another node on that interface, the access control policy needs to include the primary IP address for that interface.

IP Address Resolution

Access control policies for Serviceguard are name-based. IP addresses for incoming connections must be resolved into hostnames to match against access control policies.

Communication between two Serviceguard nodes could be received over any of their shared networks. Therefore, all of their primary addresses on each of those networks needs to be identified.

Serviceguard supports using aliases. An IP address may resolve into multiple hostnames, one of those should match the name defined in the policy.

Configuring IP Address Resolution

Serviceguard uses the operating systems built in name resolution services. It is recommended that name resolutions are defined in the node's /etc/hosts file first rather than rely on DNS or NIS services for the proper functioning of the cluster.

For example, consider a two node cluster (gryf and sly) with two private subnets and a public subnet. They will be granting permission to a non-cluster node (bit) who does not share the private subnets. The /etc/hosts file on both cluster nodes should contain:

15.145.162.131	 gryf.uksr.hp.com 	gryf 10.8.0.131 	gryf.uksr.hp.com	gryf 10.8.1.131	 gryf.uksr.hp.com	gryf 15.145.162.132 	sly.uksr.hp.com	sly 10.8.0.132 	sly.uksr.hp.com 	sly 10.8.1.132 	sly.uksr.hp.com 	sly 15.145.162.150 	bit.uksr.hp.com 	bit 
NOTE: If you use of fully qualified domain name (FQDN), Serviceguard will only recognize the hostname portion. For example, two nodes gryf.uksr.hp.com and gryf.cup.hp.com could not be in the same cluster, as they would both be treated as the same host gryf.

Serviceguard also supports domain name aliases. If other applications require different interfaces to have a unique primary hostname, the Serviceguard hostname can be one of the aliases. For example:

15.145.162.131	   gryf.uksr.hp.com 	  gryf node110.8.0.131 	      gryf.uksr.hp.com	   gryf 10.8.1.131	       gryf.uksr.hp.com	   gryf 15.145.162.132	   sly.uksr.hp.com	    sly node210.8.0.132 	      sly.uksr.hp.com 	   sly 10.8.1.132 	      sly.uksr.hp.com 	   sly 

In this configuration, the private subnets' primary name is unique. By providing the alias, Serviceguard can still associate this IP address with the proper node and match it in a access control policy.

The name service switch policy should be configured to consult the /etc/hosts file before other sources such as DNS, NIS, or LDAP. Ensure that the /etc/nsswitch.conf file on all the cluster nodes lists 'files' first, then followed by other services. For example:

For DNS, enter: (one line):

hosts: files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return]

For NIS, enter (one line):

hosts: files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return

Username Validation

Serviceguard relies on the ident service of the client node to verify the username of the incoming network connection. If the Serviceguard daemon is unable to connect to the client's ident daemon, permission will be denied.

Root on a node is defined as any user who has the UID of 0. For a user to be identified as root on a remote system, the “root” user entry in /etc/passwd for the local system must come before any other user who may also be UID 0. The ident daemon will return the username for the first UID match. For Serviceguard to consider a remote user as a root user on that remote node, the ident service must return the username as “root”.

It is possible to configure Serviceguard to not use the ident service, however this configuration is not recommended. Consult the whitepaper “Securing Serviceguard” for more information.

To disable the use of identd, add the -i option to the tcp hacl-cfg and hacl-probe inetd configurations.

For example, on HP-UX with Serviceguard A.11.16

  1. Change the cmclconfd entry in /etc/inetd.conf to appear as: hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd \ cmclconfd -c -i.

  2. Change the cmomd entry in /etc/inetd.conf to appear as: hacl-probe stream tcp nowait root \ /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -i -f \ /var/opt/cmom/cmomd.log -r /var/opt/cmom.

  3. Restart inetd: /etc/init.d/inetd restart.

Access Roles

Serviceguard has two levels of access:

  • Root Access: Users who have been authorized for root access have total control over the configuration of the cluster and packages.

  • Non-root Access: Non-root users can be assigned one of four roles:

    • Monitor: These users have read-only access to the cluster and its packages. Command line users can issue these commands: cmviewcl, cmquerycl, cmgetconf, and cmviewconf. Serviceguard Manager users can see status and configuration information on the map, tree and properties.

    • (one) Package Admin: Applies only to a specific package. On the command line, these users can issue the commands for the specified package: cmrunpkg, cmhaltpkg, cmmodnet, cmrunserv, cmhaltserv, cmstartres, cmstopres, and cmmodpkg. Serviceguard Manager users can see these Admin menu options for their specific package: Run Package, Halt Package, Move Package, and Enable or Disable Switching. Package admins can not configure or create packages. Package Admin includes the privledges of the Monitor role.

    • (all) Package Admin: Applies to all packages in the cluster. The commands are the same as the role above. Package Admin includes the privledges of the Monitor role.

    • Full Admin: These users can administer the cluster. On the command line, these users can issue these commands in their cluster: cmruncl, cmhaltcl, cmrunnode, and cmhaltnode. Full Admins can not configure or create a cluster. In the Serviceguard Manager, they can see the Admin menu for their cluster and any packages in their cluster. Full Admin includes the privledges of the Package Admin role.

If you upgrade a cluster to Version 11.16, the cmclnodelist entries are automatically updated into Access Control Policies in the cluster configuration file. All non-root user-hostname pairs will be given the role of Monitor (view only).

Setting access control policies uses different mechanisms depending on the state of the node. Nodes not configured into a cluster use different security configurations than nodes in a cluster. The following two sections discuss how to configure these access control policies.

Setting Controls for an Unconfigured Node

Serviceguard access control policies define what a remote node can do to the local node. A new install of Serviceguard will not have any access control policies defined. To enable this node to be included in a cluster, a policy must be defined to allow access for root from the other potential cluster nodes. For Serviceguard Manager, policies must be defined to allow remote COM servers to Monitor or configure the node. These policies will only be in effect while a node is not configured into a cluster.

Unconfigured nodes may authorize two levels of access to remote users: root and non-root. Users with root access may use any cluster configuration commands. Users with non-root access are assigned the Monitor role giving them read-only access to the nodes configuration.

When a Serviceguard node is not configured in a cluster it relies on one of two possible security mechanisms for authorizing remote users:

  • If the file $SGCONF/cmclnodelist file exists, Serviceguard will use its contents to authorize remote users.

  • The host equivalency files used by r-commands, ~/.rhosts and /etc/hosts.equiv (hostsequiv).

The use of cmclnodelist is strongly recommended.

Serviceguard will check for the existence of $SGCONF/cmclnodelist before attempting to access hostsequiv. If the file exists, Serviceguard will not check other authorization mechanisms. With regard to Serviceguard, using either cmclnodelist or hostsequiv provides the same levels of security. Administrators may choose to use cmclnodelist file instead of hostsequiv in installations which may wish to limit r-command access.

For backwards compatibility, a node in an unconfigured state may define access control policies based on IP address. The primary IP address on each interface Serviceguard uses for communication must have it's own policy if name services are not configured as specified above. Once a node is configured into a cluster, IP addresses can no longer be used for these policies.

Using the cmclnodelist File

The cmclnodelist file is not created by default in new installations. If administrators wish to create this "bootstrap" file they should add a comment such as the following:

###########################################################

# Do Not Edit This File

# This is only a temporary file to bootstrap an unconfigured

# node with Serviceguard version A.11.16

# Once a cluster is created, Serviceguard will not consult

# this file.

###########################################################

The format for entries in the cmclnodelist file is as follows:

[hostname or ip address] [user] [#Comment]

For example:

Table 5-1 cmclnodelist Example

gryf root # Cluster 1,Node 1
gryf user1 # Cluster 1, Node 1
sly root # Cluster 1, Node 2
sly user1 # Cluster 1, Node 2
bit root # Administration
/COM Server

 

In this example, root on the nodes gryf, sly, and bit all have root access to the node with this file. The non-root user “user1” has the Monitor role from nodes gryf and sly.

Serviceguard also accepts the use of a “+” in the cmclnodelist file which indicates that any root user on any node may configure this node and any non-root user has the Monitor role.

Using Equivalent Hosts

For installations that wish to use hostsequiv, the primary IP addresses or hostnames for each node in the cluster needs to be authorized. For more information on using hostsequiv, see man hosts.equiv(4) or the HP-UX guide, “Managing Systems and Workgroups”.

Though hostsequiv allows defining any user on any node as equivalent to root, Serviceguard will not grant root access to any user who is not root on the remote node. Such a configuration would grant "non-root" access to that user.

Defining Name Resolution Services

It is important to understand how Serviceguard uses name resolution services. When you employ any user-level Serviceguard command (including cmviewcl), the command uses name lookup to obtain the addresses of all the cluster nodes. If name services are not available, the command could hang or return an unexpected networking error message. In Serviceguard Manager, cluster or package operations also will return an error if name services are not available.

NOTE: If such a hang or error occurs, Serviceguard and all protected applications will continue working even though the command you issued does not. That is, only the Serviceguard configuration commands and Serviceguard Manager functions are impacted, not the cluster daemon or package services.

To avoid this problem, you can use the /etc/hosts file on all cluster nodes in addition to DNS or NIS. It is also recommended to make DNS highly available either by using multiple DNS servers or by configuring DNS into a Serviceguard package.

To do this, add one of the following lines in the /etc/nsswitch.conf file:

  • for DNS, enter (one line):

    hosts: dns [NOTFOUND=continue UNAVAIL=contine] dns {NOTFOUND=return UNAVAIL=return]
  • for NIS, enter (one line):

    hosts: nis [NOTFOUND=continue UNAVAIL=contine] nis {NOTFOUND=return UNAVAIL=return]

A workaround for the problem that still retains the ability to use conventional name lookup is to configure the /etc/nsswitch.conf file to search the /etc/hosts file when other lookup strategies are not working. In case name services are not available, Serviceguard commands will then use the /etc/hosts file on the local system to do name resolution. Of course, the names and IP addresses of all the nodes in the cluster must be in the /etc/hosts file.

Name Resolution Following Primary LAN Failure or Loss of DNS

There are some special configuration steps required to allow cluster configuration commands such as cmrunnode and cmruncl to continue to work properly after LAN failure, even when a standby LAN has been configured for the failed primary. These steps also protect against the loss of DNS services, allowing cluster nodes to continue communicating with one another.

  1. Edit the /etc/hosts file on all nodes in the cluster. Add name resolution for all heartbeat IP addresses, and other IP addresses from all the cluster nodes. Example:

    15.13.172.231   hasupt01
    192.2.1.1       hasupt01
    192.2.8.2       hasupt01
    15.13.172.232   hasupt02
    192.2.1.2       hasupt02
    192.2.8.2       hasupt02
    15.13.172.233   hasupt03
    192.2.1.3       hasupt03
    192.2.8.3       sgsupt03

    This ensures that messages coming from non-public networks, as well as public networks, are mapped to the correct host name.

    NOTE: For each cluster node, the public network IP address must be the first address listed. This enables other applications to talk to other nodes on public networks.
  2. Edit or create the /etc/nsswitch.conf file on all nodes and add the following line if it does not already exist:

    hosts:        files [NOTFOUND=continue] dns 

    If a line beginning with the string “hosts:” already exists, then make sure that the text immediately to the right of this string is:

    files [NOTFOUND=continue] dns 

    This step is critical so that the nodes in the cluster can still resolve hostnames to IP addresses while DNS is down or if the primary LAN is down.

  3. If not cluster exists on a node, crate and edit an /etc/cmclnodelist file on all nodes and add access to all cluster node primary IP addresses and node names:

    15.13.172.231       hasupt01
    15.13.172.232       hasupt02
    15.13.172.233       hasupt03

Creating Mirrors of Root Logical Volumes

It is highly recommended that you use mirrored root volumes on all cluster nodes. The following procedure assumes that you are using separate boot and root volumes; you create a mirror of the boot volume (/dev/vg00/lvol1), primary swap (/dev/vg00/lvol2), and root volume (/dev/vg00/lvol3). In this example and in the following commands, /dev/dsk/c4t5d0 is the primary disk and /dev/dsk/c4t6d0 is the mirror; be sure to use the correct device file names for the root disks on your system.

  1. Create a bootable LVM disk to be used for the mirror.

    pvcreate -B /dev/rdsk/c4t6d0 
  2. Add this disk to the current root volume group.

    vgextend /dev/vg00 /dev/dsk/c4t6d0 
  3. Make the new disk a boot disk.

    mkboot -l /dev/rdsk/c4t6d0  
  4. Mirror the boot, primary swap, and root logical volumes to the new bootable disk. Ensure that all devices in vg00, such as /usr, /swap, etc., are mirrored.

    NOTE: The boot, root, and swap logical volumes must be done in exactly the following order to ensure that the boot volume occupies the first contiguous set of extents on the new disk, followed by the swap and the root.

    The following is an example of mirroring the boot logical volume:

    lvextend -m 1 /dev/vg00/lvol1 /dev/dsk/c4t6d0 

    The following is an example of mirroring the primary swap logical volume:

    lvextend -m 1 /dev/vg00/lvol2 /dev/dsk/c4t6d0 

    The following is an example of mirroring the root logical volume:

    lvextend -m 1 /dev/vg00/lvol3 /dev/dsk/c4t6d0 
  5. Update the boot information contained in the BDRA for the mirror copies of boot, root and primary swap.

    /usr/sbin/lvlnboot -b /dev/vg00/lvol1
    /usr/sbin/lvlnboot -s /dev/vg00/lvol2
    /usr/sbin/lvlnboot -r /dev/vg00/lvol3 
     
  6. Verify that the mirrors were properly created.

    lvlnboot -v

    The output of this command is shown in a display like the following:

    Boot Definitions for Volume Group /dev/vg00:
    Physical Volumes belonging in Root Volume Group:
             /dev/dsk/c4t5d0 (10/0.5.0) -- Boot Disk
             /dev/dsk/c4t6d0 (10/0.6.0) -- Boot Disk
    Boot:  lvol1    on:      /dev/dsk/c4t5d0
                             /dev/dsk/c4t6d0
    Root:  lvol3    on:      /dev/dsk/c4t5d0
                             /dev/dsk/c4t6d0
    Swap:  lvol2    on:      /dev/dsk/c4t5d0
                             /dev/dsk/c4t6d0
    Dump:  lvol2    on:      /dev/dsk/c4t6d0, 0

Choosing Cluster Lock Disks

The following guidelines apply if you are using a lock disk. The cluster lock disk is configured on a volume group that is physically connected to all cluster nodes. This volume group may also contain data that is used by packages.

When you are using dual cluster lock disks, it is required that the default IO timeout values are used for the cluster lock physical volumes. Changing the IO timeout values for the cluster lock physical volumes can prevent the nodes in the cluster from detecting a failed lock disk within the allotted time period which can prevent cluster re-formations from succeeding. To view the existing IO timeout value, run the following command:

# pvdisplay <lock device file name>

The IO Timeout value should be displayed as “default.” To set the IO Timeout back to the default value, run the command:

# pvchange -t 0 <lock device file name>

The use of a dual cluster lock is only allowed with certain specific configurations of hardware. Refer to the discussion in Chapter 3 on “Dual Cluster Lock.”

Backing Up Cluster Lock Disk Information

After you configure the cluster and create the cluster lock volume group and physical volume, you should create a backup of the volume group configuration data on each lock volume group. Use the vgcfgbackup command for each lock volume group you have configured, and save the backup file in case the lock configuration must be restored to a new disk with the vgcfgrestore command following a disk failure.

NOTE: You must use the vgcfgbackup and vgcfgrestore commands to back up and restore the lock volume group configuration data regardless of how you create the lock volume group.

Ensuring Consistency of Kernel Configuration

Make sure that the kernel configurations of all cluster nodes are consistent with the expected behavior of the cluster during failover. In particular, if you change any kernel parameters on one cluster node, they may also need to be changed on other cluster nodes that can run the same packages.

Enabling the Network Time Protocol

It is strongly recommended that you enable network time protocol (NTP) services on each node in the cluster. The use of NTP, which runs as a daemon process on each system, ensures that the system time on all nodes is consistent, resulting in consistent timestamps in log files and consistent behavior of message services. This ensures that applications running in the cluster are correctly synchronized. The NTP services daemon, xntpd, should be running on all nodes before you begin cluster configuration. The NTP configuration file is /etc/ntp.conf.

For information about configuring NTP services, refer to the chapter “Configuring NTP,” in the HP-UX manual, Installation and Administration of Internet Services.

Tuning Network and Kernel Parameters

Serviceguard and its extension products such as SGeSAP, SGeRAC, and SGeFF, have been tested with default values of the supported network and kernel parameters in the ndd and kmtune utilities.

Adjust these parameters with care.

If you experience problems, return the parameters to their default values. When contacting HP support for any issues regarding Serviceguard and networking, please be sure to share all information about any parameters that were changed from the defaults.

Third-party applications that are running in a Serviceguard environment may require tuning of network and kernel parameters:

  • ndd is the network tuning utility. For more information, see the man page for ndd(1M)

  • kmtune is the system tuning utility. For more information, see the man page for kmtune(1M).

Serviceguard has also been tested with non-default values for these two network parameters:

  • ip6_nd_dad_solicit_count - This network parameter enables the Duplicate Address Detection feature for IPv6 address. For more information, see “IPv6 Relocatable Address and Duplicate Address Detection Feature” of this manual.

  • tcp_keepalive_interval - This network parameter controls the length of time the node will allow an unused network socket to exist before reclaiming its resources so they can be reused. Serviceguard supports the tcp_keepalive_interval being changed in the following configurations:

    • Supported with Serviceguard A.11.14 or later.

    • Supported on nodes running HP-UX 11.11 only.

    The following requirements must also be met:

    • The maximum value for tcp_keepalive_interval is 7200000 (2 hours, the HP-UX default value).

    • The minimum value for tcp_keepalive_interval is 60000 (60 seconds).

    • The tcp_keepalive_interval value must be set on a node before Serviceguard is started on that node. This can be done by configuring the new tcp_keepalive_interval in the /etc/rc.config.d/nddconf file, which will automatically set any ndd parameters at system boot time.

    • The tcp_keepalive_interval value must be the same for all nodes in the cluster.

For more information, see “Tunable Kernel Parameters” and “Transport Administrator’s Guide posted at http://docs.hp.com. Click “Browse by Release” then choose your operating system.

Preparing for Changes in Cluster Size

If you intend to add additional nodes to the cluster online, while it is running, ensure that they are connected to the same heartbeat subnets and to the same lock disks as the other cluster nodes. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes. Remember that a cluster of more than four nodes may not use a lock disk, but a two-node cluster must use a cluster lock. Thus, if you will eventually need five nodes, you should build an initial configuration that uses a quorum server.

If you intend to remove a node from the cluster configuration while the cluster is running, ensure that the resulting cluster configuration will still conform to the rules for cluster locks described above.

To facilitate moving nodes in and out of the cluster configuration, you can use SCSI cables with inline terminators, which allow a node to be removed from the bus without breaking SCSI termination. See the section “Online Hardware Maintenance with In-line SCSI Terminator” in the “Troubleshooting” chapter for more information on inline SCSI terminators.

If you are planning to add a node online, and a package will run on the new node, ensure that any existing cluster bound volume groups for the package have been imported to the new node. Also, ensure that the MAX_CONFIGURED_PACKAGES parameter is set high enough to accommodate the total number of packages you will be using.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.