Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Configuring OPS Clusters with ServiceGuard OPS Edition > Chapter 3 Understanding the Software Components of ServiceGuard OPS Edition

How the Package Manager Works

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator node is known as the package coordinator.

The package coordinator does the following:

  • Decides when and where to run, halt or move packages.

The package manager on all nodes does the following:

  • Initiates the execution of the user-defined control script to run and halt packages and package services.

  • Reacts to changes in the status of monitored resources.

Package Types

Two different types of packages can run in the cluster: the failover package and the system multi-node package. The system multi-node package is used only on systems that employ the VERITAS Cluster Volume Manager (CVM) as a storage manager. This package, known as VxVM-CVM-pkg, runs on all nodes that are active in the cluster and provides cluster membership information to the volume manager software. This type of package is configured and used only when you employ CVM for storage management. The process of creating the system multi-node package for CVM is described in Chapter 5. The rest of this section describes the standard failover packages.

Failover Packages

A failover package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.

Failover is shown in the following figure:

Figure 3-4 Package Moving During Failover

Package Moving During Failover

Configuring Packages

Each package is separately configured. You create a package by using SAM or by editing a package ASCII configuration file (detailed instructions are given in Chapter 6). Then you use the cmapplyconf command to check and apply the package to the cluster configuration database. You also create the package control script, which manages the execution of the package's services. Then the package is ready to run.

Deciding When and Where to Run and Halt Packages

The package configuration file assigns a name to the package and identifies the nodes on which the package can run, in order of priority. It also indicates whether or not switching is enabled for the package, that is, whether the package should switch to another node or not in the case of a failure. In addition, the package failover and failback policies allow the package manager to decide dynamically where to start up a package.

OPS instances are configured as packages with a single node in their node list. For conventional non-OPS instance packages, there may be one or many applications in a package. Package configuration is described in detail in the chapter "Configuring Packages and Their Services."

Package Switching

The AUTO_RUN parameter (known in earlier versions of ServiceGuard as the PKG_SWITCHING_ENABLED parameter) defines the default global switching attribute for the package at cluster startup, that is, whether the package should be restarted automatically on a new node in response to a failure, and whether it should be started automatically when the cluster is started. Once the cluster is running, the package switching attribute of each package can be set with the cmmodpkg command.

The parameter is coded in the package ASCII configuration file:

# The default for AUTO_RUN is YES. In the event of a
# failure, this permits the cluster software to transfer the package
# to an adoptive node. Adjust as necessary.

AUTO_RUN YES
NOTE: Packages that start and halt OPS instances (called OPS packages below) do not fail over from one node to another; they are single-node packages. You should include only one node name in the package ASCII configuration file. The AUTO_RUN setting will determine whether the OPS instance will start up as the node joins the cluster. Your cluster may include OPS and non-OPS packages in the same configuration.

A package switch involves moving non-OPS packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With package failovers, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically. Note that if the package is dependent on multiple subnetworks, all subnetworks must be available on the target node before the package will be started.

The switching of relocatable IP addresses is shown in Figure 3-5 “Before Package Switching” and Figure 3-6 “After Package Switching”. Figure 3-5 “Before Package Switching” shows a two node cluster in its original state with Package 1 running on Node 1 and Package 2 running on Node 2. OPS instances are running separately on both nodes. Users connect to the node with the IP address of the package they wish to use. Each node has a stationary IP address associated with it, and each package has an IP address associated with it.

Figure 3-5 Before Package Switching

Before Package Switching

Figure 3-6 “After Package Switching” shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. OPS instance 1 is no longer operating, but it does not fail over to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package 1's disk and Package 2's disk. OPS instance 2 now handles all database access, since instance 1 has gone down.

Figure 3-6 After Package Switching

After Package Switching

Failover Policy

The Package Manager selects a node for a package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also coded in the file or set with SAM. Failover policy governs not only failover behavior but also startup behavior for the package, including the initial startup. The two failover policies are CONFIGURED_NODE and MIN_PACKAGE_NODE. The parameter is coded in the package ASCII configuration file:

# Enter the failover policy for this package. This policy will be used
# to select an adoptive node whenever the package needs to be started.
# The default policy unless otherwise specified is CONFIGURED_NODE.
# This policy will select nodes in priority order from the list of
# NODE_NAME entries specified below.

# The alternative policy is MIN_PACKAGE_NODE. This policy will select
# the node, from the list of NODE_NAME entries below, which is
# running the least number of packages at the time of failover.

#FAILOVER_POLICY CONFIGURED_NODE

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the node list that is available.

If you use MIN_PACKAGE_NODE as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.)

Automatic Rotating Standby

Using the MIN_PACKAGE_NODE failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the package. Consider the following package configuration for a four node cluster. Note that all packages can run on all nodes and have the same NODE_NAME lists, though the node names appear in a different order in the package configuration files.

Table 3-1 Package Configuration Data

Package NameNODE_NAME ListFAILOVER_POLICY
pkgAnode1, node2, node3, node4MIN_PACKAGE_NODE
pkgBnode2, node3, node4, node1MIN_PACKAGE_NODE
pkgCnod3, node4, node1, node2MIN_PACKAGE_NODE

 

When the cluster starts, each package starts as shown in Figure 3-7 “Rotating Standby Configuration before Failover”.

Figure 3-7 Rotating Standby Configuration before Failover

Rotating Standby Configuration before Failover

If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8 “Rotating Standby Configuration after Failover”, which shows a failure on node 2:

Figure 3-8 Rotating Standby Configuration after Failover

Rotating Standby Configuration after Failover

If these packages had been set up using the CONFIGURED_NODE failover policy, they would start initially as in Figure 3-7 “Rotating Standby Configuration before Failover”, but the failure of node 2 would cause the package to start on node 3, as in Figure 3-9 “CONFIGURED_NODE Policy Packages after Failover”:

Figure 3-9 CONFIGURED_NODE Policy Packages after Failover

CONFIGURED_NODE Policy Packages after Failover

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.

NOTE: Packages that are used to start up and shut down OPS instances (OPS 8.05 and later) should employ a CONFIGURED_NODE policy, and there should be only one node in the node list—the node on which the instance is to run. OPS instances do not fail over from node to node.

Failback Policy

The use of the FAILBACK_POLICY parameter allows you to decide whether a package will return to its primary node if the primary node becomes available and the package is not currently running on the primary node. The configured primary node is the first node listed in the package's node list.

The two possible values for this policy are AUTOMATIC and MANUAL. The parameter is coded in the package ASCII configuration file:

# Enter the failback policy for this package. This policy will be used
# to determine what action to take during failover when a a package
# is not running on its primary node and its primary node is capable
# of running the package. Default is MANUAL which means no attempt
# will be made to move the package back to it primary node when it is
# running on an alternate node. The alternate policy is AUTOMATIC which
# means the package will be moved back to its primary node whenever the
# primary node is capable of running the package.

#FAILBACK_POLICY MANUAL

As an example, consider the following four-node configuration, in which FAILOVER_POLICY is set to CONFIGURED_NODE and FAILBACK_POLICY is AUTOMATIC:

Figure 3-10 Automatic Failback Configuration before Failover

Automatic Failback Configuration before Failover

Table 3-2 Package Configuration Data

Package NameNODE_NAME ListFAILOVERPOLICYFAILBACK POLICY
pkgAnode1, node4CONFIGURED_NODEAUTOMATIC
pkgBnode2, node4CONFIGURED_NODEAUTOMATIC
pkgCnode3, node4CONFIGURED_NODEAUTOMATIC

 

Node1 panics, and after the cluster reforms, pkgA starts running on node 4:

Figure 3-11 Automatic Failback Configuration After Failover

Automatic Failback Configuration After Failover

After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1.

Figure 3-12 Automatic Failback Configuration After Restart of Node 1

Automatic Failback Configuration After Restart of Node 1
NOTE: Setting the FAILBACK_POLICY to AUTOMATIC can result in a package failback and application outage during a critical production period. If you are using automatic failback, you may wish not to add the package's primary node back into the cluster until it is an appropriate time to allow the package to be taken out of service temporarily while it switches back to the primary node.

On Combining Failover and Failback Policies

Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package's running on a node where you did not expect it to run, since the node running the fewest packages will probably not be the same host every time a failover occurs.

Using Older Package Configuration Files

If you are using package configuration files that were generated using a previous version of MC/ServiceGuard, then the FAILOVER_POLICY will be the default package behavior of CONFIGURED_NODE and the FAILBACK_POLICY will be the default package behavior of MANUAL. If you wish to change these policies, edit the package configuration file to add the parameters, or use cmmakepkg to create a new package configuration file.

Starting with the A.11.12 version of ServiceGuard OPS Edition, the PKG_SWITCHING_ENABLED parameter was renamed AUTO_RUN, and the NET_SWITCHING_ENABLED parameter was renamed to LOCAL_LAN_FAILOVER_ALLOWED. The older names will still work in your configuration files, but it is recommended to change the keywords.

Using the Event Monitoring Service

Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by ServiceGuard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for ServiceGuard. Monitors currently supplied with other software products include EMS High Availability Monitors, an OTS/9000 monitor, and an ATM monitor.

If a registered resource is configured in a package, the package manager calls the resource registrar to launch an external monitor for the resource. Resources can be configured to start up either at the time the node enters the cluster or at the end of package startup. The monitor then sends messages back to ServiceGuard, which checks to see whether the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts.

You can specify a registered resource for a package by selecting it from the list of available resources displayed in the SAM package configuration area. The size of the list displayed by SAM depends on which resource monitors have been registered on your system. Alternatively, you can obtain information about registered resources on your system by using the command /opt/resmon/bin/resls. For additional information, refer to the man page for resls(1m).

Using the EMS HA Monitors

The EMS HA Monitors, available as a separate product (B5736DA), can be used to set up monitoring of disks and other resources as package dependencies. Examples of resource attributes that can be monitored using EMS include the following:

  • Logical volume status

  • Physical volume status

  • System load

  • Number of users

  • File system utilization

  • LAN health

Once a monitor is configured as a package dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node.

The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification. Refer to the manual Using High Availability Monitors (B5736-90022) for additional information.

Choosing Package Failover Behavior

To determine failover behavior, you can define a package failover policy that governs which nodes will automatically start up a package that is not running. In addition, you can define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible.

The following table describes different types of failover behavior and the settings in SAM or in the ASCII package configuration file that determine each behavior.

Table 3-3 Package Failover Behavior

Switching Behavior

Options in SAM

Parameters in ASCII File

Package switches normally after detection of service, network, or EMS failure. Halt script runs before switch takes place. (Default)

  • Package Failfast set to Disabled. (Default)

  • Service Failfast set to Disabled for all services. (Default)

  • Automatic Switching set to Enabled for the package. (Default)

  • NODE_FAIL_FAST_ENABLED set to NO. (Default)

  • SERVICE_FAIL_FAST_ENABLED set to NO for all services. (Default)

  • AUTO_RUN set to YES for the package. (Default)

Package fails over to the node with the fewest active packages.
  • Failover policy set to Minimum Package Node.

  • FAILOVER_POLICY set to MIN_PACKAGE_NODE.

Package fails over to the node that is next on the list of nodes. (Default)
  • Failover policy set to Configured Node. (Default)

  • FAILOVER_POLICY set to CONFIGURED_NODE. (Default)

Package is automatically halted and restarted on its primary node if the primary node is available and the package is running on a non-primary node.
  • Failback policy set to Automatic.

  • FAILBACK_POLICY set to AUTOMATIC.

If desired, package must be manually returned to its primary node if it is running on a non-primary node.
  • Failback policy set to Manual. (Default)

  • Failover policy set to Configured Node. (Default)

  • FAILBACK_POLICY set to MANUAL. (Default)

  • FAILOVER_POLICY set to CONFIGURED_NODE. (Default)

All packages switch following a TOC (Transfer of Control, an immediate halt without a graceful shutdown) on the node when a specific service fails. An attempt is first made to reboot the system prior to the TOC. Halt scripts are not run.

  • Service Failfast set to Enabled for a specific service

  • Automatic Switching set to Enabled for all packages.

  • SERVICE_FAIL_FAST_ENABLED set to YES for a specific service.

  • AUTO_RUN set to YES for all packages.

All packages switch following a TOC on the node when any service fails. An attempt is first made to reboot the system prior to the TOC.

  • Service Failfast set to Enabled for all services.

  • Automatic Switching set to Enabled for all packages.

  • SERVICE_FAIL_FAST_ENABLED set to YES for all services.

  • AUTO_RUN set to YES for all packages.

 

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.