 |
» |
|
|
 |
A cluster should be designed to provide the quickest possible
recovery from failures. The actual time required to recover from
a failure depends on several factors: The length of the cluster heartbeat
interval and node timeout. They should each be set as short as practical,
but not shorter than 1000000 (one second) and 2000000 (two seconds),
respectively. The recommended value for heartbeat interval is 1000000
(one second), and the recommended value for node timeout is within
the 5 to 8 second range (5000000 to 8000000). The design of the run and halt instructions in the
package control script. They should be written for fast execution. The availability of raw disk access. Applications
that use raw disk access should be designed with crash recovery
services. The application and database recovery time. They
should be designed for the shortest recovery time.
In addition, you must provide consistency across the cluster
so that: User names are the same on all nodes. UIDs are the same on all nodes. GIDs are the same on all nodes. Applications in the system area are the same on
all nodes. System time is consistent across the cluster. Files that could be used by more than one node,
such as /usr files, must be the same on all nodes.
The Serviceguard Extension for Faster Failover is a purchased
product that can optimize failover time for certain two-node clusters.
The clusters must be configured to meet certain requirements. When installed,
the product is enabled by a parameter in the cluster configuration
file. Release Notes for the product are posted at http://docs.hp.com -> high availability. Heartbeat
Subnet and Re-formation Time |  |
The speed of cluster re-formation is partially dependent on
the type of heartbeat network that is used. Ethernet results in
a slower failover time than the other types. If two or more heartbeat
subnets are used, the one with the fastest failover time is used. Cluster
Lock Information |  |
The purpose of the cluster lock is to ensure that only one
new cluster is formed in the event that exactly half of the previously
clustered nodes try to form a new cluster. It is critical that only
one new cluster is formed and that it alone has access to the disks
specified in its packages. You can specify either a lock disk or
a quorum server as the cluster lock. A one-node cluster does not require a lock. Two-node clusters
require the use of a cluster lock, but the lock is recommended for
larger clusters as well. Clusters larger than 4 nodes can only use
a quorum server as the cluster lock. Cluster
Lock Disk and Re-formation Time If you are using a lock disk, the acquisition of the cluster
lock disk takes different amounts of time depending on the disk
I/O interface that is used. After all the disk hardware is configured,
but before configuring the cluster, you can use the cmquerycl command specifying all the nodes in the cluster
to display a list of available disks and the re-formation time associated
with each. Example: # cmquerycl -v -n ftsys9 -n ftsys10 |
Alternatively, you can use SAM to display a list of cluster
lock physical volumes, including the re-formation time. By default, Serviceguard selects the disk with the fastest
re-formation time. But you may need to choose a different disk because
of power considerations. Remember that the cluster lock disk should
be separately powered, if possible. Cluster
Lock Disks and Planning for ExpansionYou can add additional cluster nodes after the cluster is
up and running, but doing so without bringing down the cluster requires
you to follow some rules. Recall that a cluster with more than 4
nodes may not have a lock disk. Thus, if you plan to add enough
nodes to bring the total to more than 4, you should use a quorum
server. Cluster
Configuration Parameters |  |
For the operation of the cluster manager, you need to define
a set of cluster parameters. These are stored in the binary cluster
configuration file, which is located on all nodes in the cluster.
These parameters can be entered by editing the cluster configuration
template file created by issuing the cmquerycl command, as described in the chapter “Building an
HA Cluster Configuration.” The parameter names given below
are the names that appear in the cluster ASCII configuration file. The following parameters must be identified: - CLUSTER_NAME
The name of the cluster as it will appear in the output of cmviewcl and other commands, and as it appears in the cluster
configuration file. The cluster name must not contain any of the following characters:
space, slash (/), backslash (\), and asterisk (*). All
other characters are legal. The cluster name can contain up to 39
characters. - QS_HOST
The name or IP address of a host system outside
the current cluster that is providing quorum server functionality.
This parameter is only used when you employ a quorum server for
tie-breaking services in the cluster. - QS_POLLING_INTERVAL
The time (in microseconds) between attempts to contact the
quorum server to make sure it is running. Default is 300,000,000
microseconds (5 minutes). - QS_TIMEOUT_EXTENSION
The quorum server timeout is the time during which the quorum
server is not communicating with the cluster. After this time, the
cluster will mark the quorum server DOWN. This time is calculated based on Serviceguard
parameters, but you can increase it by adding an additional number
of microseconds as an extension. The QS_TIMEOUT_EXTENSION is an optional parameter. - FIRST_CLUSTER_LOCK_VG, SECOND_CLUSTER_LOCK_VG
The volume group containing the physical disk volume on which
a cluster lock is written. Identifying a cluster lock volume group
is essential in a two-node cluster. If you are creating two cluster
locks, enter the volume group name or names for both locks. This
parameter is only used when you employ a lock disk for tie-breaking services
in the cluster. Use FIRST_CLUSTER_LOCK_VG for the first lock volume group. If there is a
second lock volume group, the parameter SECOND_CLUSTER_LOCK_VG is included in the file on a separate line.  |  |  |  |  | NOTE: Lock volume groups must also be defined in VOLUME_GROUP parameters in the cluster ASCII configuration
file. |  |  |  |  |
- NODE_NAME
The hostname of each system that will be a node in the cluster.
The node name can be up to 31 bytes long. The node name must not
contain the full domain name. For example, enter ftsys9, not ftsys9.cup.hp.com. - NETWORK_INTERFACE
The name of each LAN that will be used for heartbeats or for
user data. An example is lan0. - HEARTBEAT_IP
IP notation indicating the subnet that will carry the cluster
heartbeat. Note that heartbeat IP addresses must be on the same
subnet on each node. A heartbeat IP address can only be an IPv4
address. If you will be using VERITAS CVM disk groups for storage,
you can only use a single heartbeat subnet.
In this case, the heartbeat should be configured with standby LANs
or as a group of aggregated ports. - STATIONARY_IP
The IP address of each monitored subnet that does not carry
the cluster heartbeat. You can identify any number of subnets to
be monitored. If you want to separate application data from heartbeat
messages, define a monitored non-heartbeat subnet here. A stationary IP address can be either an IPv4 or an IPv6 address.
For more details of IPv6 address format, see the “IPv6
Address Types” - FIRST_CLUSTER_LOCK_PV, SECOND_CLUSTER_LOCK_PV
The name of the physical volume within the Lock Volume Group
that will have the cluster lock written on it. This parameter is FIRST_CLUSTER_LOCK_PV for the first physical lock volume and SECOND_CLUSTER_LOCK_PV for the second physical lock volume. If there
is a second physical lock volume, the parameter SECOND_CLUSTER_LOCK_PV is included in the file on a separate line. These
parameters are only used when you employ a lock disk for tie-breaking services
in the cluster. Enter the physical volume name as it appears on both nodes
in the cluster (the same physical volume may have a different name
on each node). If you are creating two cluster locks, enter the
physical volume names for both locks. The physical volume group identifier
can contain up to 39 characters. - SERIAL_DEVICE_FILE
The name of the device file that corresponds to serial (RS232)
port that you have chosen on each node. Specify this parameter when
you are using RS232 as a heartbeat line. In the ASCII cluster configuration file, this parameter is SERIAL_DEVICE_FILE. The device file name can contain up to 39 characters. - HEARTBEAT_INTERVAL
The normal interval between the transmission of heartbeat
messages from one node to the other in the cluster. Enter a number
of seconds. In the ASCII cluster configuration file, this parameter is HEARTBEAT_INTERVAL, and its value is entered in microseconds. Default value is 1,000,000 microseconds; setting the parameter
to a value less than the default is not recommended. The default should be used where possible. The maximum value
recommended is 15 seconds, and the maximum value supported is 30
seconds. This value should be at least half the value of Node Timeout (below). - NODE_TIMEOUT
The time after which a node may decide that the other node
has become unavailable and initiate cluster reformation. This parameter
is entered in microseconds. Default value is 2,000,000 microseconds in the ASCII file.
Minimum is 2 * (Heartbeat Interval). The maximum recommended value
for this parameter is 30,000,000 in the ASCII file, or 30 seconds
in Serviceguard Manager. The default setting yields the fastest
cluster reformations. However, the user of the default value increases
the potential for spurious reformations due to momentary system
hangs or network load spikes. For a significant portion of installations,
a setting of 5,000,000 to 8,000,000 (5 to 8 seconds) is more appropriate. The maximum value recommended is 30 seconds and the maximum
value supported is 60 seconds. - AUTO_START_TIMEOUT
The amount of time a node waits before it stops trying to
join a cluster during automatic cluster startup. In the ASCII cluster
configuration file, this parameter is AUTO_START_TIMEOUT. All nodes wait this amount of time for other
nodes to begin startup before the cluster completes the operation.
The time should be selected based on the slowest boot time in the
cluster. Enter a value equal to the boot time of the slowest booting
node minus the boot time of the fastest booting node plus 600 seconds
(ten minutes). Default is 600,000,000 microseconds in the ASCII file (600
seconds in Serviceguard Manager). - NETWORK_POLLING_INTERVAL
The frequency at which the networks configured for Serviceguard
are checked. In the ASCII cluster configuration file, this parameter
is NETWORK_POLLING_INTERVAL. Default is 2,000,000 microseconds in the ASCII file (2 seconds
in Serviceguard Manager). Thus every 2 seconds, the network manager
polls each network interface to make sure it can still send and
receive information. Changing this value can affect how quickly
a network failure is detected. The minimum value is 1,000,000 (1
second). The maximum value recommended is 15 seconds, and the maximum
value supported is 30 seconds. - MAX_CONFIGURED_PACKAGES
This parameter sets the maximum number of packages that can
be configured in the cluster. In the ASCII cluster configuration
file, this parameter is known as MAX_CONFIGURED_PACKAGES. Default is 0, which means that you must set this parameter
if you want to use packages. The minimum value is 0, and the maximum
value is 150. Set this parameter to a value that is high enough
to accommodate a reasonable amount of future package additions without
the need to bring down the cluster to reset the parameter. However,
be sure not to set the parameter so high that memory is wasted.
The use of packages requires 6MB plus about 100 KB of lockable memory
on all cluster nodes. Be sure to add one for the VxVM-CVM-pkg if
you are using CVM disk storage.  |  |  |  |  | NOTE: Remember to tune HP-UX kernel parameters on each node
to ensure that they are set high enough for the largest number of
packages that would ever run concurrently on that node. |  |  |  |  |
- VOLUME_GROUP
The name of an LVM volume group whose disks are attached to
at least two nodes in the cluster. Such disks are considered cluster
aware. In the ASCII cluster configuration file, this parameter is VOLUME_GROUP. The volume group name can have up to 39 characters. - Access Control Policies
Specify three things for each policy: USER_NAME, USER_HOST,
and USER_ROLE. For Serviceguard Manager, USER_HOST must be the name
of the Session node. Policies set in the configuration file of a cluster
and its packages must not be conflicting or redundant. For more
information, see “Editing
Security Files ”. - FAILOVER_OPTIMIZATION
You will only see this parameter if you have installed Serviceguard
Extension for Faster Failover, a separately purchased product. You
enable the product by setting this parameter to TWO_NODE. Default
is disabled, set to NONE. For more information about the product
and its cluster configuration requirements, go to http://www.docs.hp.com/hpux/ha
and click Serviceguard Extension for Faster Failover. - NETWORK_FAILURE_DETECTION
When there is a primary and a standby network card, Serviceguard
needs to determine when a card has failed, so it knows whether to
fail traffic over to the other card. To detect failures, Serviceguard’s
Network Manager monitors both inbound and outbound traffic. The
Manager will mark the card DOWN and begin to attempt a failover
when network traffic is not noticed for a time. (Serviceguard calculates
the time depending on the type of LAN card.) The configuration file specifies one of two ways to decide
when the network interface card has failed: INOUT - The default method will count inbound and outbound
failures separately, and declare a card down only when both have
reached a critical level. INONLY_OR_INOUT - This option combines the inbound and outbound
failure counts, and will declare a card down when the total failures
reach a critical amount, regardless of their source. With this method,
Serviceguard tries to validate inbound failure reports by doing
additional remote polling.
The default is INOUT. The suitability of an option depends mainly on your network
configuration. To see more about whether the new INONLY_OR _INOUT
option is the best for your network configuration, see “Inbound
Failure Detection Enhancement” http://docs.hp.com/hpux/ha -> Serviceguard White Papers.
Cluster
Configuration Worksheet |  |
The following worksheet will help you to organize and record
your cluster configuration.  |
Name and Nodes: =============================================================================== Cluster Name: ___ourcluster_______________ Node Names: ____node1_________________ ____node2_________________ Maximum Configured Packages: ______12________ =============================================================================== Quorum Server Data: =============================================================================== Quorum Server Host Name or IP Address: __lp_qs __________________ Quorum Server Polling Interval: _300000000_ microseconds Quorum Server Timeout Extension: _______________ microseconds =========================================================================== Subnets: =============================================================================== Heartbeat Subnet: ___15.13.168.0______ Monitored Non-heartbeat Subnet: _____15.12.172.0___ Monitored Non-heartbeat Subnet: ___________________ =========================================================================== Cluster Lock Volume Groups and Volumes: =============================================================================== First Lock Volume Group: | Physical Volume: | ________________ | Name on Node 1: ___________________ | | Name on Node 2: ___________________ | | Disk Unit No: ________ | | Power Supply No: ________ =========================================================================== Timing Parameters: =============================================================================== Heartbeat Interval: _1 sec_ =============================================================================== Node Timeout: _2 sec_ =============================================================================== Network Polling Interval: _2 sec_ Metwork Monitor _INOUT_ =============================================================================== Autostart Delay: _10 min___ =============================================================================== Cluster Aware LVM Volume Groups __________________________________________ =============================================================================== Access Policies User: __ ANY_USER Host: __ ftsys9__ Role: __ full_admin__ User: __ sara itgrp lee __ Host: __ ftsys10__ Role: __ package_admin__ =============================================================================== |
 |
|