 |
» |
|
|
 |
This section describes how to define the basic cluster configuration.
To do this in Serviceguard Manager, the graphical user interface, read
the next section. If you want to use Serviceguard commands, skip
ahead to the section entitled “Using Serviceguard Commands
to Configure the Cluster.” Using Serviceguard Manager to Configure
the Cluster |  |
Create a session on Serviceguard Manager. Select the option
for discovering unused nodes. On the map or tree, from the list
of unused nodes, select the one where you want to start the cluster.
From the Actions menu, choose Configuring. After you give the node’s root password, the Configuration
screen will open, and you will be guided through the process. Each
tab contains related information. Serviceguard Manager discovers
much of the information, so you can choose from available options,
such as lists of volume groups, networks, and nodes. There is online Help available at each step to help you make
decisions. Configure your volume groups before configuring the cluster.
If you are using a quorum server as the cluster lock, have it running
before configuring the cluster. When you complete your information, click Apply. If there
are errors, they are displayed in a log window. If not, the log
displays a “successful” message, and the binary
configuration is automatically distributed to the nodes. After a Refresh, the new cluster configuration and status
information appears in the tree, map and Properties. To modify or delete the configuration, select the cluster
on the tree or map, and choose Configuring from the Actions menu. Using Serviceguard Commands to
Configure the Cluster |  |
Use the cmquerycl command to specify a set of nodes to be included
in the cluster and to generate a template for the cluster configuration
file. Node names must be 31 bytes or less. Here is an example of
the command: # cmquerycl -v -C /etc/cmcluster/clust1.config -n ftsys9 -n ftsys10 The example creates an ASCII template file in the default
cluster configuration directory, /etc/cmcluster. The ASCII file
is partially filled in with the names and characteristics of cluster
components on the two nodes ftsys9 and ftsys10. Do not include the domain name when specifying the
node name; for example, specify ftsys9 and not ftsys9.cup.hp.com. Edit the filled-in cluster characteristics as needed to
define the desired cluster. It is strongly recommended that you
edit the file to send heartbeat over all possible networks, as shown
in the following example. Cluster
Configuration Template FileThe following is an example of an ASCII configuration file
generated with the cmquerycl command using the -w full option:  |
# ********************************************************************** # ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** # ***** For complete details about cluster parameters and how to ******* # ***** set them, consult the Serviceguard manual. ********************* # ********************************************************************** # Enter a name for this cluster. This name will be used to identify the # cluster when viewing or manipulating it. CLUSTER_NAME cluster1 # Cluster Lock Parameters # The cluster lock is used as a tie-breaker for situations # in which a running cluster fails, and then two equal-sized # sub-clusters are both trying to form a new cluster. The # cluster lock may be configured using either a lock disk # or a quorum server. # # You can use either the quorum server or the lock disk as # a cluster lock but not both in the same cluster. # # Consider the following when configuring a cluster. # For a two-node cluster, you must use a cluster lock. For # a cluster of three or four nodes, a cluster lock is strongly # recommended. For a cluster of more than four nodes, a # cluster lock is recommended. If you decide to configure # a lock for a cluster of more than four nodes, it must be # a quorum server. # Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and # FIRST_CLUSTER_LOCK_PV parameters to define a lock disk. # The FIRST_CLUSTER_LOCK_VG is the LVM volume group that # holds the cluster lock. This volume group should not be # used by any other cluster as a cluster lock device. # Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL, # and QS_TIMEOUT_EXTENSION parameters to define a quorum server. # The QS_HOST is the host name or IP address of the system # that is running the quorum server process. The # QS_POLLING_INTERVAL (microseconds) is the interval at which # Serviceguard checks to make sure the quorum server is running. # The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase # the time interval after which the quorum server is marked DOWN. # # The default quorum server timeout is calculated from the # Serviceguard cluster parameters, including NODE_TIMEOUT and # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter. # # The value of QS_TIMEOUT_EXTENSION will directly effect the amount # of time it takes for cluster reformation in the event of failure. # For example, if QS_TIMEOUT_EXTENSION is set to 10 seconds, the cluster # reformation will take 10 seconds longer than if the QS_TIMEOUT_EXTENSION # was set to 0. This delay applies even if there is no delay in # contacting the Quorum Server. The recommended value for # QS_TIMEOUT_EXTENSION is 0, which is used as the default # and the maximum supported value is 30000000 (5 minutes). # # For example, to configure a quorum server running on node # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to # add 2 seconds to the system assigned value for the quorum server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000 QS_HOST sysman5 QS_POLLING_INTERVAL 300000000 # Definition of nodes in the cluster. # Repeat node definitions as necessary for additional nodes. # NODE_NAME is the specified nodename in the cluster. # It must match the hostname and both cannot contain full domain name. # Each NETWORK_INTERFACE, if configured with IPv4 address, # must have ONLY one IPv4 address entry with it which could # be either HEARTBEAT_IP or STATIONARY_IP. # Each NETWORK_INTERFACE, if configured with IPv6 address(es) # can have multiple IPv6 address entries(up to a maximum of 2, # only one IPv6 address entry belonging to site-local scope # and only one belonging to global scope) which must be all # STATIONARY_IP. They cannot be HEARTBEAT_IP. NODE_NAME fresno NETWORK_INTERFACE lan0 HEARTBEAT_IP 15.13.168.91 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0 # Warning: There are no standby network interfaces for lan0. NODE_NAME lodi NETWORK_INTERFACE lan0 HEARTBEAT_IP 15.13.168.94 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0 # Warning: There are no standby network interfaces for lan0. # Cluster Timing Parameters (microseconds). # The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds). # This default setting yields the fastest cluster reformations. # However, the use of the default value increases the potential # for spurious reformations due to momentary system hangs or # network load spikes. # For a significant portion of installations, a setting of # 5000000 to 8000000 (5 to 8 seconds) is more appropriate. # The maximum value recommended for NODE_TIMEOUT is 30000000 # (30 seconds). HEARTBEAT_INTERVAL 1000000 NODE_TIMEOUT 2000000 # The FAILOVER_OPTIMIZATION parameter enables Failover Optimization, # which reduces the time Serviceguard takes for failover. (Failover # Optimization cannot, however, change the time an application # needs to shut down or restart.) # # There are four requirements: # * The Serviceguard Extension for Faster Failover product # (SGeFF) must be installed on all cluster nodes. # * Only one or two node clusters are supported. # * A quorum server must be configured as the tie-breaker. # * The cluster must have more than one heartbeat subnet, # and neither can be a serial line (RS232). # # Other considerations are listed in the SGeFF Release Notes # and the Serviceguard manual. # # You must halt the cluster to change FAILOVER_OPTIMIZATION # parameter. # # To enable Failover Optimization, set FAILOVER_OPTIMIZATION # to TWO_NODE. # The default is NONE. # # FAILOVER_OPTIMIZATION <NONE/TWO_NODE> FAILOVER_OPTIMIZATION NONE # Configuration/Reconfiguration Timing Parameters (microseconds). AUTO_START_TIMEOUT 600000000 NETWORK_POLLING_INTERVAL 2000000 # Network Monitor Configuration Parameters. # The NETWORK_FAILURE_DETECTION parameter determines how LAN card failures are detec ted. # If set to INONLY_OR_INOUT, a LAN card will be considered down when its inbound # message count stops increasing or when both inbound and outbound # message counts stop increasing. # If set to INOUT, both the inbound and outbound message counts must # stop increasing before the card is considered down. NETWORK_FAILURE_DETECTION INOUT # Package Configuration Parameters. # Enter the maximum number of packages which will be configured in the cluster. # You can not add packages beyond this limit. # This parameter is required. MAX_CONFIGURED_PACKAGES 150 # Access Control Policy Parameters. # # Three entries set the access control policy for the cluster: # First line must be USER_NAME, second USER_HOST, and third USER_ROLE. # Enter a value after each. # # 1. USER_NAME can either be ANY_USER, or a maximum of # 8 login names from the /etc/passwd file on user host. # 2. USER_HOST is where the user can issue Serviceguard commands. # If using Serviceguard Manager, it is the COM server. # Choose one of these three values: ANY_SERVICEGUARD_NODE, or # (any) CLUSTER_MEMBER_NODE, or a specific node. For node, # use the official hostname from domain name server, and not # an IP addresses or fully qualified name. # 3. USER_ROLE must be one of these three values: # * MONITOR: read-only capabilities for the cluster and packages # * PACKAGE_ADMIN: MONITOR, plus administrative commands for packages # in the cluster # * FULL_ADMIN: MONITOR and PACKAGE_ADMIN plus the administrative # commands for the cluster. # # Access control policy does not set a role for configuration # capability. To configure, a user must log on to one of the # cluster’s nodes as root (UID=0). Access control # policy cannot limit root users’ access. # # MONITOR and FULL_ADMIN can only be set in the cluster configuration file, # and they apply to the entire cluster. PACKAGE_ADMIN can be set in the # cluster or a package configuration file. If set in the cluster # configuration file, PACKAGE_ADMIN applies to all configured packages. # If set in a package configuration file, PACKAGE_ADMIN applies to that # package only. # # Conflicting or redundant policies will cause an error while applying # the configuration, and stop the process. The maximum number of access # policies that can be configured in the cluster is 200. # # # Example: to configure a role for user john from node noir to # administer a cluster and all its packages, enter: # USER_NAME john # USER_HOST noir # USER_ROLE FULL_ADMIN USER_NAME root USER_HOST ANY_SERVICEGUARD_NODE USER_ROLE full_admin # List of cluster aware LVM Volume Groups. These volume groups will # be used by package applications via the vgchange -a e command. # Neither CVM or VxVM Disk Groups should be used here. # For example: # VOLUME_GROUP /dev/vgdatabase # VOLUME_GROUP /dev/vg02 # List of OPS Volume Groups. # Formerly known as DLM Volume Groups, these volume groups # will be used by OPS or RAC cluster applications via # the vgchange -a s command. (Note: the name DLM_VOLUME_GROUP # is also still supported for compatibility with earlier versions.) # For example: # OPS_VOLUME_GROUP /dev/vgdatabase # OPS_VOLUME_GROUP /dev/vg02
|
 |
The man page for the cmquerycl command lists the definitions of all the parameters that
appear in this file. Many are also described in the “Planning” chapter.
Modify your /etc/cmcluster/clust1.config file to your requirements, using the data on the cluster
worksheet. In the file, keywords are separated from definitions by white
space. Comments are permitted, and must be preceded by a pound sign
(#) in the far left column. See the man page for the cmquerycl command for more details. A cluster lock is required for two node clusters like the
one in this example. The lock must be accessible to all nodes and
must be powered separately from the nodes. Refer to the section “Cluster
Lock” in Chapter 3 for additional information. Enter the
lock disk information following the cluster name. The lock disk
must be in an LVM volume group that is accessible to all the nodes
in the cluster. The default FIRST_CLUSTER_LOCK_VG and FIRST_CLUSTER_LOCK_PV supplied in the ASCII template created with cmquerycl are the volume group and physical volume name
of a disk chosen based on minimum failover time calculations. You
should ensure that this disk meets your power wiring requirements.
If necessary, choose a disk powered by a circuit which powers fewer than half the nodes in the cluster. To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster.
Do not include the node’s entire domain name; for example,
specify ftsys9 not ftsys9.cup.hp.com: # cmquerycl -v -n ftsys9 -n ftsys10 The output of the command lists the disks connected to each
node together with the re-formation time associated with each.  |  |  |  |  | NOTE: You should not configure a second
lock volume group or physical volume unless your configuration specifically
requires it. See the discussion “Dual Cluster Lock” in
the section “Cluster Lock” in Chapter 3. |  |  |  |  |
If your configuration requires you to configure a second cluster
lock, enter the following parameters in the cluster configuration
file: SECOND_CLUSTER_LOCK_VG /dev/volume-group SECOND_CLUSTER_LOCK_PV /dev/dsk/block-special-file |
where the /dev/volume-group is the name
of the second volume group and block-special-file is
the physical volume name of a lock disk in the chosen volume group.
These lines should be added for each node. Specifying
a Quorum ServerTo specify a quorum server instead of a lock disk, use the -q option of the cmquerycl command, specifying a Quorum
Server host server. Example: # cmquerycl -n ftsys9 -n ftsys10 -q qshost The cluster ASCII file that is generated in this case contains
parameters for defining the quorum server. This portion of the file
is shown below:  |
# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL, # and QS_TIMEOUT_EXTENSION parameters to define a quorum server. # The QS_HOST is the host name or IP address of the system # that is running the quorum server process. The # QS_POLLING_INTERVAL (microseconds) is the interval at which # The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase # the time interval after which the quorum server is marked DOWN. # # The default quorum server interval is calculated from the # Serviceguard cluster parameters, including NODE_TIMEOUT and # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter. # # For example, to configure a quorum server running on node # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to # add 2 seconds to the system assigned value for the quorum server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000
|
 |
Enter the QS_HOST, QS_POLLING_INTERVAL and, if desired, a QS_TIMEOUT_EXTENSION. Modifying
Cluster Timing ParametersThe cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact
the cluster’s reformation and failover times. It is useful
to modify these parameters if the cluster is reforming occasionally due
to heavy system load or heavy network traffic. The default value of 2 seconds for NODE_TIMEOUT leads to a best case failover time of 30 seconds.
If NODE_TIMEOUT is changed to 10 seconds, which means that the
cluster manager waits 5 times longer to timeout a node, the failover
time is increased by 5, to approximately 150 seconds. NODE_TIMEOUT must be at least 2*HEARTBEAT_INTERVAL. A good rule of thumb is to have at least two
or three heartbeats within one NODE_TIMEOUT. Identifying
Serial Heartbeat Connections If you are using a serial (RS232) line as a heartbeat connection,
use the SERIAL_DEVICE_FILE parameter and enter the device file name that corresponds
to the serial port you are using on each node. Be sure that the
serial cable is securely attached during and after configuration. Serviceguard Extension for Faster Failover (SGeFF) is a separately purchased
product. If it is installed, the configuration file will display
the parameter to enable it. SGeFF reduces the time it takes Serviceguard to process a
failover. It cannot, however, change the time it takes for packages
and applications to gracefully shut down and restart. SGeFF has requirements for cluster configuration, as outlined
in the cluster configuration template file. For more information, see the Serviceguard Extension for Faster Failover
Release Notes posted on http://www.docs.hp.com/hpux/ha. New in Serviceguard Version 11.16, Access Control Policies
allow non-root user to use common administrative commands. Non-root users of Serviceguard Manager, the graphical user
interface, need to have a configured access policy to view and to
administer Serviceguard clusters, packages and packages. In new
configurations, it is a good idea to immediately configure at least
one monitor access policy. Check spelling when entering text, especially when typing
wildcards, such as ANY_USER and CLUSTER_MEMBER_NODE. If they are misspelled,
Serviceguard will assume they are specific users or nodes. You may
not configure the access policy you intended to configure. A root user on the cluster can create or modify access policies
while the cluster is running. Verifying
the Cluster Configuration |  |
In Serviceguard Manager, click the Check button to verify
the configuration. If you have edited an ASCII cluster configuration file using
the command line, use the following command to verify the content
of the file: # cmcheckconf -k -v -C /etc/cmcluster/clust1.config |
Both methods check the following: Network addresses and connections. Cluster lock connectivity (if you are configuring
a lock disk). Validity of configuration parameters for the cluster
and packages. Existence and permission of scripts specified in
the command line. If all nodes specified are in the same heartbeat
subnet. If you specify the wrong configuration filename. If all nodes can be accessed. No more than one CLUSTER_NAME, HEARTBEAT_INTERVAL, and AUTO_START_TIMEOUT are specified. The value for package run and halt script timeouts
is less than 4294 seconds. The value for NODE_TIMEOUT is at least twice the value of HEARTBEAT_INTERVAL. The value for AUTO_START_TIMEOUT variables is >=0. Heartbeat network minimum requirement. The cluster
must have one heartbeat LAN configured with a standby, two heartbeat
LANs, one heartbeat LAN and an RS232 connection, or one heartbeat network
with no local LAN switch, but with a primary LAN that is configured
as a link aggregate of at least two interfaces. At least one NODE_NAME is specified. Each node is connected to each heartbeat network. All heartbeat networks are of the same type of LAN. The network interface device files specified are
valid LAN device files. If a serial (RS-232) heartbeat is configured, there
are no more than two nodes in the cluster, and no more than one
serial (RS232) port connection per node. VOLUME_GROUP entries are not currently marked as cluster-aware. There is only one heartbeat subnet configured if
you are using CVM disk storage.
If the cluster is online, the check also verifies that all
the conditions for the specific change in configuration have been
met.  |  |  |  |  | NOTE: Using the -k option means that cmcheckconf only checks disk connectivity to the LVM disks that are
identified in the ASCII file. Omitting the -k option (the default behavior) means that cmcheckconf tests the connectivity of all LVM disks on all nodes.
Using -k can result in significantly faster operation of the command. |  |  |  |  |
Distributing
the Binary Configuration File |  |
After specifying all cluster parameters, you apply the configuration.
This action distributes the binary configuration file to all the
nodes in the cluster. We recommend doing this separately before you configure packages (described in the next chapter).
In this way, you can verify the cluster lock, heartbeat networks,
and other cluster-level operations by using the cmviewcl command on the running cluster. Before distributing the
configuration, ensure that your security files permit copying among the
cluster nodes. See “Preparing Your Systems” at
the beginning of this chapter. Distributing
the Binary File with Serviceguard ManagerWhen you have finished entering the information, click Apply. Distributing
the Binary File on the Command LineUse the following steps to generate the binary configuration
file and distribute the configuration to all nodes in the cluster: Activate the cluster lock volume group
so that the lock disk can be initialized: # vgchange -a y /dev/vglock |
Deactivate the cluster lock volume group. # vgchange -a n /dev/vglock |
The cmapplyconf command creates a binary version of the cluster configuration
file and distributes it to all nodes in the cluster. This action ensures
that the contents of the file are consistent across all nodes. Note that
the cmapplyconf command does not distribute the ASCII configuration file. Storing
Volume Group and Cluster Lock Configuration Data After configuring the cluster, create a backup copy of the
LVM volume group configuration by using the vgcfgbackup command for each volume group you have created. If a
disk in a volume group must be replaced, you can then restore the
disk's metadata by using the vgcfgrestore command. The procedure is described under “Replacing
Disks” in the “Troubleshooting” chapter. Be sure to use vgcfgbackup for all volume groups, including the cluster lock volume
group.
|