Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Configuring OPS Clusters with MC/LockManager: > Chapter 3 Understanding MC/LockManager Software Components

How the Package Manager Works

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator node is known as the package coordinator.

The package coordinator does the following:

  • Decides when and where to run, halt or move packages.

The package manager on all nodes does the following:

  • Executes the user-defined control script to run and halt packages and package services.

  • Reacts to changes in the status of monitored resources.

A package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator starts a package on a new node, as in the following figure:

Figure 3-2 Package Moving During Failover

Package Moving During Failover

Deciding When and Where to Run and Halt Packages

Each package is separately configured by means of a package configuration file, which can be edited manually or through SAM. This file assigns a name to the package and identifies the nodes on which the package can run, in order of priority. It also indicates whether or not switching is enabled for the package, that is, whether the package should switch to another node or not in the case of a failure. In addition, the package failover and failback policies allow the package manager to decide dynamically where to start up a package.

OPS instances are configured as packages with a single node in their node list. For conventional non-OPS instance packages, there may be one or many applications in a package. Package configuration is described in detail in the chapter "Chapter 6 “Configuring Packages and Their Services ”."

Startup Policies

The Package Manager selects a node for a package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also coded in the file or set with SAM. Failover policy governs not only failover behavior but also startup behavior for the package, including the initial startup. The two failover policies are CONFIGURED_NODE and MIN_PACKAGE_NODE.

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.

If you use MIN_PACKAGE_NODE as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.)

Automatic Rotating Standby

Using the MIN_PACKAGE_NODE failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the package. Consider the following package configuration for a four node cluster. Note that all packages can run on all nodes and have the same NODE_NAME lists, though the node names appear in a different order in the package configuration files.

Figure 3-3 Rotating Standby Configuration before Failover

Rotating Standby Configuration before Failover

Table 3-1 Rotating Standby Node Naming

Package NameNODE_NAME ListFAILOVER_POLICY
pkgAnode1, node2, node3, node4MIN_PACKAGE_NODE
pkgBnode2, node3, node4, node1MIN_PACKAGE_NODE
pkgCnod3, node4, node1, node2MIN_PACKAGE_NODE

 

When the cluster starts, each package starts as shown in the Figure 3-3 “Rotating Standby Configuration before Failover”. If a failure occurs, any package would fail over to the node containing fewest running packages, as in the following, which shows a failure on node 2:

Figure 3-4 Rotating Standby Configuration after Failover

Rotating Standby Configuration after Failover

If these packages had been set up using the CONFIGURED_NODE failover policy, they would start initially as in Figure 3-3 “Rotating Standby Configuration before Failover”, but the failure of node 2 would cause the package to start on node 3:

Figure 3-5 Rotating Standby Configuration before Failover

Rotating Standby Configuration before Failover

If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.

NOTE: Packages that are used to start up and shut down OPS instances (OPS 8.05 and later) should be configured using a CONFIGURED_NODE policy, and there should be only one node in the node list--the node on which the instance is to run. OPS instances do not fail over from node to node.

Automatic Failback

The use of the FAILBACK_POLICY parameter allows packages to return to the primary node if the primary node becomes available and the package is not currently running on the primary node. The configured primary node is the first node listed in the package's node list. Here are some examples of what can happen when the FAILBACK_POLICY is set to AUTOMATIC:

  • FAILOVER_POLICY is set to CONFIGURED_NODE. Node1 panics and reboots. After the cluster reforms, pkgA starts running on node 2. After the reboot, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 2 and restarted on node 1.

  • FAILOVER_POLICY is set to CONFIGURED_NODE. A service in pkgA fails. The Package Manager halts pkgA on node 1 and disables package switching for node 1. The package is then restarted on node 2. Sometime in the future, an administrator re-enables package switching for node 1. At that point, pkgA will be stopped on node 2 and restarted on node 1.

  • FAILOVER_POLICY is set to MIN_PACKAGE_NODE. A service in pkgA fails. The Package Manager halts pkgA on node 1 and disables package switching for node 1. The package is then restarted on node 2. At the time that pkgA failed, node 1 was running two other packages and node 2 was running one package. An administrator re-enables package switching for pkgA on node 1. At this point, pkgA continues to run on node 2 because both nodes are now running two packages and even though the primary node can run pkgA, it is not the node with the minimum number of packages. Next, an administrator halts one of the other packages on node 1. At this point, pkgA will be halted on node 2 and restarted on node 1.

NOTE: Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package's running on a node where you did not expect it to run. It is recommended that you plan the use of these parameters carefully.

You should always use a FAILOVER_POLICY of CONFIGURED_NODE for packages that start up OPS instances; these packages are configured with a node list containing a single node name.

Running Application Services

After a cluster has formed, the package manager on each node starts up packages on that node. Starting a package means running individual application services on the node where the package is running.

To start a package, the package manager runs the package control script with the 'start' parameter. This script performs the following tasks:

  • uses Logical Volume Manager (LVM) commands to activate volume groups needed by the package.

  • mounts filesystems from the activated volume groups to the local node.

  • uses cmmodnet to add the package's IP address to the current network interface running on a configured subnet. This allows clients to connect to the same address regardless of the node the service is running on.

  • uses the cmrunserv command to start up each application service configured in the package. This command also initiates monitoring of the service.

  • executes a set of customer-defined run commands to do additional processing, as required.

While the package is running, services are continuously monitored. If any part of a package fails, the package halt instructions are executed as part of a recovery process. Failure may result in simple loss of the service, a restart of the service, transfer of the package to an adoptive node, or transfer of all packages to adoptive nodes, depending on the package configuration. In package transfers, MC/LockManager sends a TCP/IP packet across the heartbeat subnet to the package's adoptive node telling it to start up the package.

NOTE: When applications run as services in an MC/LockManager package, you do not start them directly; instead, the package manager runs packages on your behalf either when the cluster starts or when a package is enabled on a specific node. Similarly, you do not halt an individual application or service directly once it becomes part of a package. Instead you halt the package or the node. Refer the chapter "Chapter 7 “Cluster and Package Maintenance”" for a description of how to start and stop packages.

The package configuration file and control script are described in detail in the chapter "Chapter 6 “Configuring Packages and Their Services ”."

Service Monitor

The Service Monitor checks the PIDs of services started by the package control script. If it detects a PID failure, the package is halted. Depending upon the parameters set in the package configuration file, MC/LockManager will attempt to restart the service on the primary node, or the package will fail over to a specified adoptive node.

Using the Event Monitoring Service

Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by MC/LockManager directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for MC/LockManager. Monitors currently supplied with other software products include an OTS/9000 monitor and an ATM monitor.

If a registered resource is configured in a package, the package manager calls the resource registrar to launch an external monitor for the resource. The monitor then sends messages back to MC/LockManager, which checks to see whether the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts.

Using the EMS HA Monitors

The EMS HA Monitors, available as a separate product (A5735AA), can be used to set up monitoring of disks and other resources as package dependencies. Examples of resource attributes that can be monitored using EMS include the following:

  • Logical volume status

  • Physical volume status

  • System load

  • LAN health

Once a monitor is configured as a package dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node.

The EMS HA Monitors can also be used to report monitored events to a target application such as ClusterView for graphical display or for operator notification. Refer to the manual Using EMS HA Monitors (HP part number B5735-90001) for additional information.

Stopping the Package

The package manager is notified when a command is issued to shut down a package. In this case, the package control script is run with the 'stop' parameter. For example, if the system administrator chooses "Halt Package" from the "Package Administration" menu in SAM, the package manager will stop the package. Similarly, when a command is issued to halt a cluster node, the package manager will shut down all the packages running on the node, executing each package control script with the 'stop' parameter. When run with the 'stop' parameter, the control script:

  • uses cmhaltserv to halt each service.

  • unmounts filesystems that had been mounted by the package.

  • uses Logical Volume Manager (LVM) commands to deactivate volume groups used by the package.

  • uses cmmodnet to delete the package's IP address from the current network interface.

NOTE: The package is automatically stopped on the failure of a package component. For more details, refer to "Responses to Package and Service Failures," below.
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1999 Hewlett-Packard Development Company, L.P.