 |
» |
|
|
 |
Planning for packages involves assembling information about
each group of highly available services. Some of this information
is used in creating the package configuration file, and some is
used for editing the package control script.  |  |  |  |  | NOTE: Volume groups that are to be activated by packages must
also be defined as cluster aware in the cluster configuration file.
See the previous section on "Cluster Configuration Planning." |  |  |  |  |
Logical Volume and Filesystem Planning |  |
You may need to use logical volumes in volume groups as part
of the infrastructure for package operations on a cluster. When
the package moves from one node to another, it must be able to access
data residing on the same disk as on the previous node. This is
accomplished by activating the volume group and mounting the file
system that resides on it. In MC/LockManager, high availability applications, services,
and data are located in volume groups that are on a shared bus.
When a node fails, the volume groups containing the applications,
services, and data of the failed node are deactivated on the failed
node and activated on the adoptive node. In order to do this, you
have to configure the volume groups so that they can be transferred
from the failed node to the adoptive node. As part of planning, you need to decide the following: What volume groups are needed? How much disk space is required, and how should
this be allocated in logical volumes? What file systems need to be mounted for each package? Which nodes need to import which logical volume
configurations. If a package moves to an adoptive node, what effect
will its presence have on performance?
Create a list by package of volume groups, logical volumes,
and file systems. Indicate which nodes need to have access to common
filesystems at different times. It is recommended that you use customized logical volume names
that are different from the default logical volume names (lvol1,
lvol2, etc.). Choosing logical volume names that represent the high
availability applications that they are associated with (for example,
lvoldatabase)
will simplify cluster administration. To further document your package-related volume groups, logical
volumes, and file systems on each node, you can add commented
lines to the /etc/fstab file. The following is an example for a
database application: # /dev/vg01/lvoldb1 /applic1 vxfs defaults 0 1 # These six entries are # /dev/vg01/lvoldb2 /applic2 vxfs defaults 0 1 # for information purposes # /dev/vg01/lvoldb3 raw_tables ignore ignore 0 0 # only. They record the # /dev/vg01/lvoldb4 /general vxfs defaults 0 2 # logical volumes that # /dev/vg01/lvoldb5 raw_free ignore ignore 0 0 # exist for MC/LockManager's # /dev/vg01/lvoldb6 raw_free ignore ignore 0 0 # HA package. Do not uncomment.
|
Create an entry for each logical volume, indicating its use
for a file system or for a raw device.  |  |  |  |  | CAUTION: Do not use /etc/fstab to mount
file systems that are used by MC/LockManager packages. |  |  |  |  |
Details about creating, exporting, and importing volume groups
in MC/LockManager are given in the chapter on "Building
an HA Cluster Configuration." Monitoring Registered Package
Resources |  |
MC/LockManager has access to a registry of resources that
can be monitored as package dependencies. The registry is the core
of the Event Monitoring Service (EMS). Once an EMS registered resource
is configured as a package dependency, MC/LockManager can fail a
package to another node based on messages the resource's monitor
returns. Monitors for individual resources may be provided by hardware
or software vendors from time to time. A specific group of HA EMS
monitors for disk, LAN, and system status information is available
from HP as a separate product. Refer to the manual Using
EMS HA Monitors (B5735-90001) for additional information. You can specify a registered resource for a package by selecting
it from the list of available resources displayed in the SAM package
configuration area. The size of the list displayed by SAM depends
on which resource monitors have been registered on your system.
Alternatively, you can obtain information about registered resources
on your system by using the command /opt/resmon/bin/resls.
For additional information, refer to the man page for resls(1m). Choosing Switching and Failover Behavior |  |
Switching IP addresses from a failed LAN card to a standby
LAN card on the same physical subnet may take place if Automatic
Switching is set to Enabled in SAM (NET_SWITCHING_ENABLED
set to YES in the ASCII package configuration file). Automatic Switching
Enabled is the default. To determine failover behavior, you can define a package startup
policy that governs which nodes will automatically start up a package
that is not running. In addition, you can define a failback policy
that determines whether a package will be automatically returned
to its primary node when that is possible. The following table describes different types of failover
behavior and the settings in SAM or in the ASCII package configuration
file that determine each behavior. Table 4-2 Package
Failover Behavior Switching Behavior | Options in
SAM | Parameters in ASCII File |
|---|
Package IP
address switches to standby LAN card transparently on LAN card failure | Automatic Switching set to Enabled for the package
(Default)
| NET_SWITCHING_ENABLED
set to YES for the package (Default)
| Package switches
normally after detection of failure or report of an EMS monitor
event showing that a resource on which the package depends is down.
Halt script runs before switch takes place (default behavior) | Package Failfast set to Disabled. (Default) Service Failfast set to Disabled for all services.
(Default) Automatic Switching set to Enabled for the package.
(Default)
| NODE_FAIL_FAST_ENABLED
set to NO. (Default) SERVICE_FAIL_FAST_ENABLED
set to NO for all services. (Default) PKG_SWITCHING_ENABLED
set to YES for the package. (Default)
| | Package fails over
to the node with the fewest active packages | Failover policy set to Minimum Package Node
| FAILOVER_POLICY
set to MIN_PACKAGE_NODE
| | Package fails over
to the node that is next on the list of nodes (default behavior) | Failover policy set to Configured Node
| FAILOVER_POLICY
set to CONFIGURED_NODE
| | Package is automatically
halted and restarted on its primary node if the primary node is
available and the package is running on a non-primary node | Failback policy set to Automatic
| FAILBACK_POLICY
set to AUTOMATIC
| | If desired, package
must be manually returned to its primary node if it is running on
a non-primary node | Failback policy set to Manual
| FAILBACK_POLICY
set to MANUAL
| All packages
switch following a TOC (Transfer of Control, an immediate halt without
a graceful shutdown) on the node when a specific service fails.
Halt scripts are not run. | Package Failfast set to Disabled Service Failfast set to Enabled for a specific service Automatic Switching set to Enabled for all packages.
| NODE_FAIL_FAST_ENABLED
set to NO SERVICE_FAIL_FAST_ENABLED
set to YES for a specific service. PKG_SWITCHING_ENABLED
set to YES for all packages.
| All packages
switch following a TOC on the node when any service fails. | Package Failfast set to Disabled. Service Failfast set to Enabled for all
services. Automatic Switching set to Enabled for all packages.
| NODE_FAIL_FAST_ENABLED
set to NO. SERVICE_FAIL_FAST_ENABLED
set to YES for all services. PKG_SWITCHING_ENABLED
set to YES for all packages.
| All packages switch following
a TOC on the node when the run or halt script exits with an error
other than 0 or 1. This may be caused by an EMS monitor event showing
that a resource is down | Package
Failfast set to Enabled. Automatic Switching set to Enabled for all packages.
| NODE_FAIL_FAST_ENABLED
set to YES. PKG_SWITCHING_ENABLED
set to YES for all packages.
|
Package Configuration File Parameters |  |
Prior to generation of the package configuration file, assemble
the following package configuration data. The parameter names given
below are the names that appear in SAM. The names coded in the ASCII
cluster configuration file appear at the end of each entry. The
following parameters must be identified and entered on the worksheet
for each package: Title not available (Package Configuration File Parameters ) - Package Name
The name of the package. The package name must be unique in
the cluster. It is used to start, stop, modify, and view the package. The package name must not contain any of the following illegal
characters: '/', '\',
and '*'. All other characters are legal. In the ASCII package configuration file, this parameter is
known as PACKAGE_NAME. - Package Failover Policy
The policy to be used
by the package manager to start the node to run the package whenever
the package is automatically started. The default is CONFIGURED_NODE,
which means the next available node in the list of node names for
the package. The order of node name entries dictates the order of
preference when selecting the node. The alternate policy is MIN_PACKAGE_NODE,
which means the node from the list that is running the fewest other
packages at the time this package is to be started. In the ASCII package configuration file, this parameter is
known as FAILOVER_POLICY. - Failback Policy
The policy used to determine
what action the package manager should take if the package is not
running on its primary node and its primary node is capable of running
the package. The default is MANUAL, which means no attempt will
be made to move the package back to its primary node when it is
running on an alternate node. The alternate policy is AUTOMATIC,
which means that the package will halted and restarted on its primary
node as soon as the primary node is capable of running the package
and, if MIN_PACKAGE_NODE is the Package Startup Policy, is running
fewer packages than the current node. In the ASCII package configuration file, this parameter is
known as FAILBACK_POLICY. - Node Name
The names of primary and alternate nodes for the package,
e.g., ftsys9 and ftsys10. The order in which you specify the node
names is important. First list the primary node name, then the first
adoptive node name, then the second adoptive node name, followed,
in order, by additional node names. Ownership of a package may be
transferred to the next adoptive node name listed in the package
configuration file. In the ASCII package configuration file, this parameter is
known as NODE_NAME. - Control Script Pathname
Enter the
full pathname of the package control script. (The script must reside
in a directory that contains the string "cmcluster".)
It is recommended that you use the same script as both the run and
halt script. This script will contain both your package run instructions
and your package halt instructions. When the package starts, its
run script is executed and passed the parameter 'start'; similarly,
at package halt time, the halt script is executed and passed the
parameter 'stop'. In the ASCII package configuration file, this parameter maps
to the two separate parameters named RUN_SCRIPT
and HALT_SCRIPT.
Use the name of the single control script as the name of the RUN_SCRIPT
and the HALT_SCRIPT
in the ASCII file. If you wish to separate the package run instructions and package
halt instructions into separate scripts, the package configuration
file allows allows you to do this by naming two separate scripts.
However, under most conditions, it is simpler to combine your run
and halt instructions into a single package control script and repeat
its name for both the RUN_SCRIPT
and the HALT_SCRIPT.
Ensure that the script is executable.  |  |  |  |  | NOTE: If you choose to write separate package run and halt
scripts, be sure to include identical configuration information
(such as node names, IP addresses, etc.) in both scripts. |  |  |  |  |
- Run Script Timeout and Halt Script Timeout
Enter a number of seconds. If the script has not completed
by the specified timeout value, MC/LockManager will terminate the
script. The default is 0, or no timeout. If the timeout is exceeded: Control of the package will not be
transferred. The run or halt instructions will not be run. Global switching will be disabled. The current node will be disabled from running the
package. The control script will exit with status 1.
In the ASCII package configuration file, this parameter is
called RUN_SCRIPT_TIMEOUT
and HALT_SCRIPT_TIMEOUT.
The default for both is 0 or NO_TIMEOUT.
In the ASCII file, this parameter is entered in microseconds. If the halt script timeout occurs, you may need to perform
manual cleanup See "Package Control Script Hangs or Failures"
in Chapter 8.. - Service Name
Enter a unique
name for each service. You can configure a maximum of 30 services
per package. In the ASCII package configuration file, this parameter is
called SERVICE_NAME.
Define one SERVICE_NAME
entry for each service. - Service Fail Fast
Enter Enabled
or Disabled for each service. This parameter indicates whether or
not the failure of a service results in the failure of a node. If
the parameter is set to Enabled, in the event of a service failure,
MC/LockManager will halt the node on which the service is running
with a TOC. The default is Disabled. In the ASCII package configuration file, this parameter is
SERVICE_FAIL_FAST_ENABLED,
and possible values are YES and NO. The default is NO. Define one
SERVICE_FAIL_FAST_ENABLED
entry for each service. The service name must not contain any of the following illegal
characters: '/', '\',
and '*'. All other characters are legal. - Service Halt Timeout
In the event
of a service halt, MC/LockManager will first send out a SIGTERM
signal to terminate the service. If the process is not terminated,
MC/LockManager will wait for the specified timeout before sending
out the SIGKILL signal to force process termination. Default is
300 seconds (5 minutes). In the ASCII package configuration file, this parameter is
SERVICE_HALT_TIMEOUT.
Define one SERVICE_HALT_TIMEOUT
entry for each service. - Subnet
Enter the
IP subnets that are to be monitored for the package. In the ASCII package configuration file, this parameter is
called SUBNET. - Resource Name
The name of
a resource that is to be monitored by MC/LockManager as a package
dependency. A resource name is the name of an important attribute
of a particular system resource. The resource name includes the
entire hierarchy of resource class and subclass within which the
resource exists on a system. In the ASCII package configuration file, this parameter is
called RESOURCE_NAME.
Obtain the resource name from the list provided in SAM, or obtain
it from the documentation supplied with the resource monitor. A maximum of 60 resources may be defined per cluster.
Note also the limit on Resource Up Values described below. - Resource Polling Interval
The frequency
of monitoring an additional package resource. The default is 60
seconds. In the ASCII package configuration file, this parameter
is called RESOURCE_POLLING_INTERVAL.
The Resource Polling Interval appears on the list provided in SAM,
or you can obtain it from the documentation supplied with the resource
monitor. - Resource Up Value
The criteria
for judging whether an additional package resource has failed or
not. In the ASCII package configuration file, this parameter is
called RESOURCE_UP_VALUE.
The Resource Up Value appears on the list provided in SAM, or you
can obtain it from the documentation supplied with the resource
monitor. You can configure a total of 15 Resource Up Values per package.
For example, if there is only one resource in the package, then
a maximum of 15 Resource Up Values can be defined. If there are
two Resource Names defined and one of them has 10 Resource Up Values,
then the other Resource Name can have only 5 Resource Up Values. - Automatic Switching
Enter Enabled
or Disabled. The default is Enabled, which allows a package to start
up normally on a cluster node. In the event of a failure, a value
of Enabled permits MC/LockManager to transfer the package to an
adoptive node. If this parameter is set to Disabled, the package
will not start up automatically when the cluster starts running. In the ASCII package configuration file, this parameter is
called PKG_SWITCHING_ENABLED,
and possible values are YES and NO. The default is YES. If this
parameter is set to NO, the package will not start up automatically
when the cluster starts running. - Local Switching
Enter Enabled
or Disabled. In the event of a failure, this permits MC/LockManager
to switch LANs locally, that is, transfer to a standby LAN card.
The default is Enabled. In the ASCII package configuration file, this parameter is
called NET_SWITCHING_ENABLED,
and possible values are YES and NO. The default is YES. - Package Fail Fast Enabled
In the event
of the failure of the control script itself or the failure of a
subnet or the report of an EMS monitor event showing that a resource
is down, if this parameter is set to Enabled, MC/LockManager will
issue a TOC on the node where the control script fails. The default
is Disabled. In the ASCII package configuration file, this parameter is
called NODE_FAIL_FAST_ENABLED,
and possible values are YES and NO. The default is NO.
Package Control Script Variables |  |
The control script that accompanies each package must also
be edited to assign values to a set of variables. The following
variables must be set: Title not available (Package Control Script Variables ) - Volume Groups, Logical Volumes, File Systems and Mount Options
Determine the filesystems and corresponding logical
volumes within the volume groups required. Example: pkg1 requires /dev/vg01/lvol1 mounted on /vg01 |
Indicate the names of volume groups that are to be activated
and deactivated, together with the logical volumes and file systems
that are to be mounted. You can also specify options that are to
be used with the HP-UX mount
command. On starting the package, the script activates a volume
group, and it may mount logical volumes onto file systems. At halt
time, the script unmounts the file systems and deactivates each
volume group. All volume groups must be accessible on each target
node. In the ASCII package control script, these variables are arrays,
as follows: VG,
LV, FS
and FS_MOUNT_OPT.
For each file system (FS),
you must identify a logical volume (LV).
Include as many volume groups (VGs)
as needed. If you are using raw files, the LV,
FS, and FS_MOUNT_OPT
entries are not needed. Only cluster aware volume groups should be specified in package
control scripts. To make a volume group cluster aware, enter it
as part of the cluster configuration. See above, "Cluster
Configuration Planning." - IP Addresses and SUBNETs
These are the IP addresses by which a package is mapped to
a LAN card. Indicate the IP addresses and subnets for each IP address
you want to add to an interface card. The Subnet is the IP address
logically ANDed with the subnet mask. In the ASCII package control script, these variables are entered
in pairs. Example IP[0]=192.10.25.12
and SUBNET[0]=192.10.25.0.
(In this case the subnet mask is 255.255.255.0.) - Service Name
Enter
a unique name for each specific service within the package. All
services are monitored by MC/LockManager. The service name, service
command, and service restart parameters are entered in the package
control script in groups of three. You may specify as many service
names as you need. Each name must be unique within the cluster.
The service name is the name used by cmrunserv
and cmhaltserv
inside the package control script. In the ASCII package control script, enter values into an
array known as SERVICE_NAME.
Enter one service name for each service. - Service Command
For
each named service, enter a service command. This command will be
executed through the control script by means of the cmrunserv
command. In the ASCII package control script, enter values into an
array known as SERVICE_CMD.
Enter one service command string for each service. - Service Restart Parameter
Enter
a number of restarts. One valid form of the parameter is -r n
where n is a number of retries. A value
of "-r 0" indicates no retries. A value of "-R" indicates an infinite
number of retries. The default is 0, or no restarts. In the ASCII package control script, enter values into an
array known as SERVICE_RESTART.
Enter one restart value for each service.
For information on using a DTC with MC/LockManager, see the
chapter entitled "Configuring DTC Manager for Operation
with MC/ServiceGuard" in the manual Using the
HP DTC Manager/UX. The package control script will clean up the environment and
undo the operations in the event of an error. Package Configuration Worksheet |  |
Assemble your package configuration and control script data
in a separate worksheet for each package.  |
=============================================================================== Package Configuration File Data: =============================================================================== Package Name: ______pkg11_______________ Failover Policy: _________________ Failback Policy: __AUTOMATIC____ Primary Node: ______ftsys9_______________ First Failover Node:____ftsys10_______________ Second Failover Node:_________________________________ Package Run Script: __/etc/cmcluster/pkg1/control.sh__Timeout: _NO_TIMEOUT_ Package Halt Script: __/etc/cmcluster/pkg1/control.sh_Timeout: _NO_TIMEOUT_ Package Switching Enabled? __YES___ Local Switching Enabled? ___YES__ Node Failfast Enabled? ____NO____ Additional Package Resource: Resource Name:________ Polling Interval_______ Resource UP Value___________ =============================================================================== Package Control Script Data: ================================================================================ VG[0]___/dev/vg01 __LV[0]__/dev/vg01/lvol1__FS[0]____/mnt1___FS_MOUNT_OPT[0]____ VG[1]_______________LV[1]___________________FS[1]____________FS_MOUNT_OPT[1]____ VG[2]_______________LV[2]___________________FS[2]____________FS_MOUNT_OPT[2]____ IP[0] ___15.13.171.14 ______________ SUBNET[0]_______15.13.168________ IP[1] ______________________________ SUBNET[1]________________________ X.25 Resource Name _________________ Service Name: __Svc1____ Run Command: __/usr/bin/MySvc -f_____Retries: _-r 2__ Service Fail Fast Enabled? ___NO___Service Halt Timeout __NO_TIMEOUT_______ Service Name: __________ Run Command: _______________________ Retries: ________ Service Fail Fast Enabled? _________Service Halt Timeout __________ |
 |
|