Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing Serviceguard Twelfth Edition > Chapter 3 Understanding Serviceguard Software Components

How Package Control Scripts Work

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.

What Makes a Package Run?

There are 3 types of packages:

  • The failover package is the most common type of package. It runs on one node at a time. If a failure occurs, it can switch to another node listed in its configuration file. If switching is enabled for several nodes, the package manager will use the failover policy to determine where to start the package.

  • A system multi-node package runs on all the active cluster nodes at the same time. It can be started or halted on all nodes, but not on individual nodes.

  • A multi-node package can run on several nodes at the same time. If AUTO_RUN is set to YES, Serviceguard starts the multi-node package on all the nodes listed in its configuration file. It can be started or halted on all nodes, or on individual nodes, either by user command (cmhaltpkg) or automatically by Serviceguard in response to a failure of a package component, such as service, EMS resource, or subnet.

Multi-node and system multi-node packages are only supported for use by applications specified by Hewlett-Packard. Do not edit control script files for the CFS system multi-node or multi-node packages; they are created and modified by the cfs* commands.

The CVM system multi-node package is initiated in a cluster by running cmapplyconf on /etc/cmcluster/VxVM-CVM-pkg.conf (for CVM 3.5) or on /etc/cmcluster/SG-CFS-pkg.conf (for CVM 4.1 without CFS).

The CFS packages, however, are not created by performing cmapplyconf on package configuration files, but by a series of CFS-specific commands. Serviceguard determines most of their options; all user-determined options can be entered as parameters to the commands. (See the cfs admin commands in Appendix A.)

A failover package can be configured to have a dependency on a multi-node or system multi-node package. The package manager cannot start a package on a node unless the package it depends is already up and running on that node.

The package manager will always try to keep a failover package running unless there is something preventing it from running on any node. The most common reasons for a failover package not being able to run are that AUTO_RUN is disabled so Serviceguard is not allowed to start the package, that NODE_SWITCHING is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching.

A failover package starts on the first available node in its configuration file; by default, it fails over to the next available one in the list. Note that you do not necessarily have to use a cmrunpkg command to restart a failed failover package; in many cases, the best way is to enable package and/or node switching with the cmmodpkg command.

When you create the package, you indicate the list of nodes on which it is allowed to run. System multi-node packages must list all cluster nodes in their cluster. Multi-node packages and failover packages can name some subset of the cluster’s nodes or all of them.

If the AUTO_RUN parameter is set to YES in a package’s configuration file Serviceguard automatically starts the package when the cluster starts. System multi-node packages are required to have AUTO_RUN set to YES. If a failover package has AUTO_RUN set to NO, Serviceguard cannot start it automatically at cluster startup time; you must explicitly enable this kind of package using the cmmodpkg command.

NOTE: If you configure the package while the cluster is running, the package does not start up immediately after the cmapplyconf command completes. To start the package without halting and restarting the cluster, issue the cmrunpkg or cmmodpkg command.

How does a failover package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13 “Package Time Line Showing Important Events”.

Figure 3-13 Package Time Line Showing Important Events

Package Time Line Showing Important Events

The following are the most important moments in a package’s life:

  1. Before the control script starts

  2. During run script execution

  3. While services are running

  4. When a service, subnet, or monitored resource fails, or a dependency is not met.

  5. During halt script execution

  6. When the package or the node is halted with a command

  7. When the node fails

Before the Control Script Starts

First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node. Another type of resource is a dependency on a monitored external resource or on a special-purpose package. If monitoring shows a value for a configured resource that is outside the permitted range, the package cannot start.

Once a node is selected, a check is then done to make sure the node allows the package to start on it. Then services are started up for a package by the control script on the selected node. Strictly speaking, the run script on the selected node is used to start the package.

During Run Script Execution

This section applies only to failover packages.

Once the package manager has determined that the package can start on a particular node, it launches the run script (that is, a failover package’s control script that is executed with the ‘start’ parameter). This failover package script carries out the following steps (also shown in Figure 3-14 “Package Time Line for Run Script Execution”):

  1. Activates volume groups or disk groups.

  2. Mounts file systems.

  3. Assigns package IP addresses to the LAN card on the node.

  4. Executes any customer-defined run commands.

  5. Starts each package service.

  6. Starts up any EMS (Event Monitoring Service) resources needed by the package that were specially marked for deferred startup.

  7. Exits with an exit code of zero (0).

Figure 3-14 Package Time Line for Run Script Execution

Package Time Line for Run Script Execution

At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error.

Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script. During run script execution, messages are written to a log file in the same directory as the run script. This log has the same name as the run script and the extension .log. Normal starts are recorded in the log, together with error messages or warnings related to starting the package.

NOTE: After the package run script has finished its work, it exits, which means that the script is no longer executing once the package is running normally. After the script exits, the PIDs of the services started by the script are monitored by the package manager directly. If the service dies, the package manager will then run the package halt script or, if SERVICE_FAILFAST_ENABLED is set to YES, it will halt the node on which the package is running. If a number of Restarts is specified for a service in the package control script, the service may be restarted if the restart count allows it, without re-running the package run script.

Normal and Abnormal Exits from the Run Script

Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully.

  • 0—normal exit. The package started normally, so all services are up on this node.

  • 1—abnormal exit, also known as NO_RESTART exit. The package did not complete all startup steps normally. Services are killed, and the package is disabled from failing over to other nodes.

  • 2—alternative exit, also known as RESTART exit. There was an error, but the package is allowed to start up on another node. You might use this kind of exit from a customer defined procedure if there was an error, but starting the package on another node might succeed. A package with a RESTART exit is disabled from running on the local node, but can still run on other nodes.

  • Timeout—Another type of exit occurs when the RUN_SCRIPT_TIMEOUT is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however. The package script may not have been able to clean up some of its resources such as LVM volume groups, VxVM disk groups or package mount points, so before attempting to start up the package on any node, be sure to check whether any resources for the package need to be cleaned up.

Service Startup with cmrunserv

Within the package control script, the cmrunserv command starts up the individual services. This command is executed once for each service that is coded in the file. Each service has a number of restarts associated with it. The cmrunserv command passes this number to the package manager, which will restart the service the appropriate number of times if the service should fail. The following are some typical settings:

SERVICE_RESTART[0]=" "        ; do not restart
SERVICE_RESTART[0]="-r <n>"   ; restart as many as <n> times
SERVICE_RESTART[0]="-R"       ; restart indefinitely
NOTE: If you set <n> restarts and also set SERVICE_FAILFAST_ENABLED to YES, the failfast will take place after <n> restart attempts have failed. It does not make sense to set SERVICE_RESTART to “-R” for a service and also set SERVICE_FAILFAST_ENABLED to YES.

While Services are Running

During the normal operation of cluster services, the package manager continuously monitors the following:

  • Process IDs of the services

  • Subnets configured for monitoring in the package configuration file

  • Configured resources on which the package depends

Some failures can result in a local switch. For example, if there is a failure on a specific LAN card and there is a standby LAN configured for that subnet, then the Network Manager will switch to the healthy LAN card. If a service fails but the RESTART parameter for that service is set to a value greater than 0, the service will restart, up to the configured number of restarts, without halting the package.

If there is a configured EMS resource dependency and there is a trigger that causes an event, the package will be halted.

During normal operation, while all services are running, you can see the status of the services in the “Script Parameters” section of the output of the cmviewcl command.

When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met

What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, if a configured resource fails, or if a configured dependency on a special-purpose package is not met, then a failover package will halt on its current node and, depending on the setting of the package switching flags, may be restarted on another node. If a multi-node or system multi-node package fails, all of the packages that have configured a dependency on it will also fail.

Package halting normally means that the package halt script executes (see the next section). However, if a failover package’s configuration has the SERVICE_FAILFAST_ENABLED flag set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script.

If AUTO_RUN is set to YES, the package will start up on another eligible node, if it meets all the requirements for startup. If AUTO_RUN is set to NO, then the package simply halts without starting up anywhere else.

NOTE: If a package is dependent on a subnet, and the subnet on the primary node fails, then the package will start to shut down. If the subnet recovers immediately (before the package is restarted on an adoptive node), then the package could be restarted on the primary node. Therefore the package does not switch to another node in the cluster in this case.

When a Package is Halted with a Command

The Serviceguard cmhaltpkg command has the effect of executing the package halt script, which halts the services that are running for a specific package. This provides a graceful shutdown of the package that is followed by disabling automatic package startup (AUTO_RUN).

You cannot halt a multi-node or system multi-node package unless all packages that have a configured dependency on it are down. Use cmviewcl to check the status of dependents. For example, if pkg1 and pkg2 depend on PKGa, both pkg1 and pkg2 must be halted before you can halt PKGa.

NOTE: If the cmhaltpkg command is issued with the -n <nodename> option, then the package is halted only if it is running on that node.

The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node.

During Halt Script Execution

This section applies only to failover packages.

Once the package manager has detected the failure of a service or package that a failover pacakge depends on, or when the cmhaltpkg command has been issued for a particular failover package, then the package manager launches the halt script. That is, the failover package’s control script executes the ‘halt’ parameter. This script carries out the following steps (also shown in Figure 3-15 “Package Time Line for Halt Script Execution”):

  1. Halts any deferred resources that had been started earlier.

  2. Halts all package services.

  3. Executes any customer-defined halt commands.

  4. Removes package IP addresses from the LAN card on the node.

  5. Unmounts file systems.

  6. Deactivates volume groups.

  7. Exits with an exit code of zero (0).

Figure 3-15 Package Time Line for Halt Script Execution

Package Time Line for Halt Script Execution

At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file in the same directory as the halt script. This log has the same name as the halt script and the extension.log. Normal starts are recorded in the log, together with error messages or warnings related to halting the package.

Normal and Abnormal Exits from the Halt Script

The package’s ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes:

  • 0—normal exit. The package halted normally, so all services are down on this node.

  • 1—abnormal exit, also known as NO_RESTART exit. The package did not halt normally. Services are killed, and the package is disabled globally. It is not disabled on the current node, however.

  • Timeout—Another type of exit occurs when the HALT_SCRIPT_TIMEOUT is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however. The package script may not have been able to clean up some of its resources such as LVM volume groups, VxVM disk groups or package mount points, so before attempting to start up the package on any node, be sure to check whether any resources for the package need to be cleaned up

Package Control Script Error and Exit Conditions

Table 3-4 “Error Conditions and Package Movement for Failover Packages” shows the possible combinations of error condition, failfast setting and package movement for failover packages.

Table 3-4 Error Conditions and Package Movement for Failover Packages

Package Error Condition

Results

Error or Exit CodeNode Failfast EnabledService Failfast EnabledHP-UX Status on Primary after ErrorHalt script runs after Error or ExitPackage Allowed to Run on Primary Node after ErrorPackage Allowed to Run on Alternate Node
Service FailureYESYESTOCNoN/A (TOC)Yes
Service FailureNOYESTOCNoN/A (TOC)Yes
Service FailureYESNORunningYesNoYes
Service FailureNONORunningYesNoYes
Run Script Exit 1Either SettingEither SettingRunningNoNot changedNo
Run Script Exit 2YESEither SettingTOCNoN/A (TOC)Yes
Run Script Exit 2NOEither SettingRunningNoNoYes
Run Script TimeoutYESEither SettingTOCNoN/A (TOC)Yes
Run Script TimeoutNOEither SettingRunningNoNot changedNo
Halt Script Exit 1YESEither SettingRunningN/AYesNo
Halt Script Exit 1NOEither SettingRunningN/AYesNo
Halt Script TimeoutYESEither SettingTOCN/AN/A (TOC)Yes, unless the timeout happened after the cmhaltpkg command was executed.
Halt Script TimeoutNOEither SettingRunningN/AYesNo
Service FailureEither SettingYESTOCNo N/A (TOC)Yes
Service FailureEither SettingNORunningYesNoYes
Loss of NetworkYESEither SettingTOCNoN/A (TOC)Yes
Loss of NetworkNOEither SettingRunningYesYesYes
Loss of Monitored ResourceYESEither SettingTOCNoN/A (TOC)Yes
Loss of Monitored ResourceNOEither SettingRunningYesYes, if the resource is not a deferred resource. No, if the resource is deferred.Yes

dependency package failed

Either Setting

Either Setting

Running

Yes

Yes when dependency is again met

Yes if depend
-ency met

 

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.