Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing Serviceguard Twelfth Edition > Chapter 3 Understanding Serviceguard Software Components

How the Package Manager Works

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available.

Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.

The package coordinator does the following:

  • Decides when and where to run, halt, or move packages.

The package manager on all nodes does the following:

  • Executes the control scripts that run and halt packages and their services.

  • Reacts to changes in the status of monitored resources.

Package Types

Three different types of packages can run in the cluster: the most common is the failover package. There are also special-purpose packages that run on more than one node at a time, and so do not failover. They are typically used to manage resources of certain failover packages.

Non-failover Packages

There are also two types of special-purpose packages that do not failover and that can run on more than one node at the same time: the system multi-node package, which runs on all nodes in the cluster, and the multi-node package, which can be configured to run on all or some of the nodes in the cluster.

These packages are not for general use, and are only supported by Hewlett-Packard for specific applications.

One common system multi-node package is shipped with the Serviceguard product. It is used on systems that employ VERITAS Cluster Volume Manager (CVM) as a storage manager. This package is known as VxVM-CVM-pkg for VERITAS CVM Version 3.5 and called SG-CFS-pkg for VERITAS CVM Version 4.1. It runs on all nodes that are active in the cluster and provides cluster membership information to the volume manager software. This type of package is configured and used only when you employ CVM for storage management.

The process of creating the system multi-node package for CVM without CFS is described in “Preparing the Cluster for Use with CVM ”. The process of creating the system multi-node package for CVM with CFS is described in “Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)”.

The multi-node packages are used in clusters that use the VERITAS Cluster File System (CFS) and other HP-specified applications. They can run on on several nodes at a time, but need not run on all. These packages are used when creating cluster file system dependencies.

The rest of this section describes the standard failover packages.

Failover Packages

A failover package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.

Failover is shown in the following figure:

Figure 3-4 Package Moving During Failover

Package Moving During Failover
Configuring Failover Packages

Each package is separately configured. You create a failover package by using Serviceguard Manager or by editing a package ASCII configuration file template. (Detailed instructions are given in Chapter 6 “Configuring Packages and Their Services ”).

Then you use the cmapplyconf command to check and apply the package to the cluster configuration database.

You also create the package control script, which manages the execution of the package’s services. “Creating the Package Control Script” for detailed information.

Then the package is ready to run.

Deciding When and Where to Run and Halt Failover Packages

The package configuration file assigns a name to the package and includes a list of the nodes on which the package can run.

Failover packages list the nodes in order of priority (i.e., the first node in the list is the highest priority node). In addition, failover packages’ files contain three parameters that determine failover behavior. These are the AUTO_RUN parameter, the FAILOVER_POLICY parameter, and the FAILBACK_POLICY parameter.

Failover Packages’ Switching Behavior

The AUTO_RUN parameter (known in earlier versions of Serviceguard as the PKG_SWITCHING_ENABLED parameter) defines the default global switching attribute for a failover package at cluster startup: that is, whether Serviceguard can automatically start the package when the cluster is started, and whether Serviceguard should automatically restart the package on a new node in response to a failure. Once the cluster is running, the package switching attribute of each package can be temporarily set with the cmmodpkg command; at reboot, the configured value will be restored.

The parameter is coded in the package ASCII configuration file:

# The default for AUTO_RUN is YES. In the event of a
# failure, this permits the cluster software to transfer the package to an adoptive node. Adjust as necessary.

AUTO_RUN   YES

A package switch involves moving failover packages and their associated IP addresses to a new system. The new system must already have the same subnet configured and working properly, otherwise the packages will not be started.

With package failovers, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically. Note that if the package is dependent on multiple subnets, all of them must be available on the target node before the package will be started.

If the package has a dependency, the dependency must be met on the target node before the package can start.

The switching of relocatable IP addresses is shown in Figure 3-5 “Before Package Switching” and Figure 3-6 “After Package Switching”. Figure 3-5 “Before Package Switching” shows a two node cluster in its original state with Package 1 running on Node 1 and Package 2 running on Node 2. Users connect to node with the IP address of the package they wish to use. Each node has a stationary IP address associated with it, and each package has an IP address associated with it.

Figure 3-5 Before Package Switching

Before Package Switching

Figure 3-6 “After Package Switching” shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.

Figure 3-6 After Package Switching

After Package Switching

Failover Policy

The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also in the configuration file. The failover policy governs how the package manager selects which node to run a package on when a specific node has not been identified and the package needs to be started. This applies not only to failovers but also to startup for the package, including the initial startup. The two failover policies are CONFIGURED_NODE (the default) and MIN_PACKAGE_NODE. The parameter is coded in the package ASCII configuration file:

# Enter the failover policy for this package. This policy will be used
# to select an adoptive node whenever the package needs to be started.
# The default policy unless otherwise specified is CONFIGURED_NODE.
# This policy will select nodes in priority order from the list of
# NODE_NAME entries specified below.

# The alternative policy is MIN_PACKAGE_NODE. This policy will select
# the node, from the list of NODE_NAME entries below, which is
# running the least number of packages at the time of failover.

#FAILOVER_POLICY        CONFIGURED_NODE

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node available in the node list. When a failover occurs, the package will move to the next highest priority node in the list that is available.

If you use MIN_PACKAGE_NODE as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.)

Automatic Rotating Standby

Using the MIN_PACKAGE_NODE failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the cluster. Consider the following package configuration for a four node cluster. Note that all packages can run on all nodes and have the same NODE_NAME lists. Although the example shows the node names in a different order for each package, this is not required.

Table 3-1 Package Configuration Data

Package NameNODE_NAME ListFAILOVER_POLICY
pkgAnode1, node2, node3, node4MIN_PACKAGE_NODE
pkgBnode2, node3, node4, node1MIN_PACKAGE_NODE
pkgCnode3, node4, node1, node2MIN_PACKAGE_NODE

 

When the cluster starts, each package starts as shown in Figure 3-7 “Rotating Standby Configuration before Failover”.

Figure 3-7 Rotating Standby Configuration before Failover

Rotating Standby Configuration before Failover

If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8 “Rotating Standby Configuration after Failover”, which shows a failure on node 2:

Figure 3-8 Rotating Standby Configuration after Failover

Rotating Standby Configuration after Failover
NOTE: Using the MIN_PACKAGE_NODE policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.

If these packages had been set up using the CONFIGURED_NODE failover policy, they would start initially as in Figure 3-7 “Rotating Standby Configuration before Failover”, but the failure of node 2 would cause the package to start on node 3, as in Figure 3-9 “CONFIGURED_NODE Policy Packages after Failover”:

Figure 3-9 CONFIGURED_NODE Policy Packages after Failover

CONFIGURED_NODE Policy Packages after Failover

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.

Failback Policy

The use of the FAILBACK_POLICY parameter allows you to decide whether a package will return to its primary node if the primary node becomes available and the package is not currently running on the primary node. The configured primary node is the first node listed in the package’s node list.

The two possible values for this policy are AUTOMATIC and MANUAL. The parameter is coded in the package ASCII configuration file:

# Enter the failback policy for this package. This policy will be used
# to determine what action to take during failover when a a package
# is not running on its primary node and its primary node is capable
# of running the package. Default is MANUAL which means no attempt
# will be made to move the package back to it primary node when it is
# running on an alternate node. The alternate policy is AUTOMATIC which
# means the package will be moved back to its primary node whenever the
# primary node is capable of running the package.

#FAILBACK_POLICY           MANUAL

As an example, consider the following four-node configuration, in which FAILOVER_POLICY is set to CONFIGURED_NODE and FAILBACK_POLICY is AUTOMATIC:

Figure 3-10 Automatic Failback Configuration before Failover

Automatic Failback Configuration before Failover

Table 3-2 Node Lists in Sample Cluster

Package NameNODE_NAME ListFAILOVERPOLICYFAILBACK POLICY
pkgAnode1, node4CONFIGURED_NODEAUTOMATIC
pkgBnode2, node4CONFIGURED_NODEAUTOMATIC
pkgCnode3, node4CONFIGURED_NODEAUTOMATIC

 

Node1 panics, and after the cluster reforms, pkgA starts running on node4:

Figure 3-11 Automatic Failback Configuration After Failover

Automatic Failback Configuration After Failover

After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1.

Figure 3-12 Automatic Failback Configuration After Restart of Node 1

Automatic Failback Configuration After Restart of Node 1
NOTE: Setting the FAILBACK_POLICY to AUTOMATIC can result in a package failback and application outage during a critical production period. If you are using automatic failback, you may wish not to add the package’s primary node back into the cluster until it is an appropriate time to allow the package to be taken out of service temporarily while it switches back to the primary node.
On Combining Failover and Failback Policies

Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package’s running on a node where you did not expect it to run, since the node running the fewest packages will probably not be the same host every time a failover occurs.

Using Older Package Configuration Files

If you are using package configuration files that were generated using a previous version of Serviceguard, we recommend you use the cmmakepkg command to open a new template, and then copy the parameter values into it. In the new template, read the descriptions and defaults of the choices that did not exist when the original configuration was made. For example, the default for FAILOVER_POLICY is now CONFIGURED_NODE and the default for FAILBACK_POLICY is now MANUAL.

In Serviceguard A.11.17 and later, you specify a package type parameter; the PACKAGE_TYPE for a traditional package is the default value, FAILOVER.

Starting with the A.11.12 version of Serviceguard, the PKG_SWITCHING_ENABLED parameter was renamed AUTO_RUN. The NET_SWITCHING_ENABLED parameter was renamed to LOCAL_LAN_FAILOVER_ALLOWED.

Using the Event Monitoring Service

Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for Serviceguard. Monitors currently supplied with other software products include EMS (Event Monitoring Service) High Availability Monitors, and an ATM monitor.

If a registered resource is configured in a package, the package manager calls the resource registrar to launch an external monitor for the resource. Resources can be configured to start up either at the time the node enters the cluster or at the end of package startup. The monitor then sends messages back to Serviceguard, which checks to see whether the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts.

You can specify a registered resource for a package by selecting it from the list of available resources displayed in the Serviceguard Manager Configuring Packages. The size of the list displayed by Serviceguard Manager depends on which resource monitors have been registered on your system. Alternatively, you can obtain information about registered resources on your system by using the command /opt/resmon/bin/resls. For additional information, refer to the man page for resls(1m).

Using the EMS HA Monitors

The EMS (Event Monitoring Service) HA Monitors, available as a separate product (B5736DA), can be used to set up monitoring of disks and other resources as package dependencies. Examples of resource attributes that can be monitored using EMS include the following:

  • Logical volume status

  • Physical volume status

  • System load

  • Number of users

  • File system utilization

  • LAN health

Once a monitor is configured as a package dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node.

The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification. Refer to the manual Using High Availability Monitors (B5736-90022) for additional information.

Choosing Package Failover Behavior

To determine failover behavior, you can define a package failover policy that governs which nodes will automatically start up a package that is not running. In addition, you can define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible.

The following table describes different types of failover behavior and the settings in Serviceguard Manager or in the package configuration file that determine each behavior.

Table 3-3 Package Failover Behavior

Switching Behavior

Options in Serviceguard Manager

Parameters in ASCII File

Package switches normally after detection of service, network, or EMS failure, or when a configured dependency is not met. Halt script runs before switch takes place. (Default)

  • Package Failfast set to Disabled. (Default)

  • Service Failfast set to Disabled for all services. (Default)

  • Automatic Switching set to Enabled for the package. (Default)

  • NODE_FAIL_FAST_ENABLED set to NO. (Default)

  • SERVICE_FAIL_FAST_ENABLED set to NO for all services. (Default)

  • AUTO_RUN set to YES for the package. (Default)

Package fails over to the node with the fewest active packages.
  • Failover policy set to Minimum Package Node.

  • FAILOVER_POLICY set to MIN_PACKAGE_NODE.

Package fails over to the node that is next on the list of nodes. (Default)
  • Failover policy set to Configured Node. (Default)

  • FAILOVER_POLICY set to CONFIGURED_NODE. (Default)

Package is automatically halted and restarted on its primary node if the primary node is available and the package is running on a non-primary node.
  • Failback policy set to Automatic.

  • FAILBACK_POLICY set to AUTOMATIC.

If desired, package must be manually returned to its primary node if it is running on a non-primary node.
  • Failback policy set to Manual. (Default)

  • Failover policy set to Configured Node. (Default)

  • FAILBACK_POLICY set to MANUAL. (Default)

  • FAILOVER_POLICY set to CONFIGURED_NODE. (Default)

All packages switch following a TOC (Transfer of Control, an immediate halt without a graceful shutdown) on the node when a specific service fails. An attempt is first made to reboot the system prior to the TOC. Halt scripts are not run.

  • Service Failfast set to Enabled for a specific service

  • Automatic Switching set to Enabled for all packages.

  • SERVICE_FAIL_FAST_ENABLED set to YES for a specific service.

  • AUTO_RUN set to YES for all packages.

All packages switch following a TOC on the node when any service fails. An attempt is first made to reboot the system prior to the TOC.

  • Service Failfast set to Enabled for all services.

  • Automatic Switching set to Enabled for all packages.

  • SERVICE_FAIL_FAST_ENABLED set to YES for all services.

  • AUTO_RUN set to YES for all packages.

 

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.