| United States-English |
|
|
|
![]() |
Managing Serviceguard Twelfth Edition > Chapter 3 Understanding
Serviceguard Software ComponentsHow Package Control Scripts Work |
|
Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster. There are 3 types of packages:
Multi-node and system multi-node packages are only supported for use by applications specified by Hewlett-Packard. Do not edit control script files for the CFS system multi-node or multi-node packages; they are created and modified by the cfs* commands. The CVM system multi-node package is initiated in a cluster by running cmapplyconf on /etc/cmcluster/VxVM-CVM-pkg.conf (for CVM 3.5) or on /etc/cmcluster/SG-CFS-pkg.conf (for CVM 4.1 without CFS). The CFS packages, however, are not created by performing cmapplyconf on package configuration files, but by a series of CFS-specific commands. Serviceguard determines most of their options; all user-determined options can be entered as parameters to the commands. (See the cfs admin commands in Appendix A.) A failover package can be configured to have a dependency on a multi-node or system multi-node package. The package manager cannot start a package on a node unless the package it depends is already up and running on that node. The package manager will always try to keep a failover package running unless there is something preventing it from running on any node. The most common reasons for a failover package not being able to run are that AUTO_RUN is disabled so Serviceguard is not allowed to start the package, that NODE_SWITCHING is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching. A failover package starts on the first available node in its configuration file; by default, it fails over to the next available one in the list. Note that you do not necessarily have to use a cmrunpkg command to restart a failed failover package; in many cases, the best way is to enable package and/or node switching with the cmmodpkg command. When you create the package, you indicate the list of nodes on which it is allowed to run. System multi-node packages must list all cluster nodes in their cluster. Multi-node packages and failover packages can name some subset of the cluster’s nodes or all of them. If the AUTO_RUN parameter is set to YES in a package’s configuration file Serviceguard automatically starts the package when the cluster starts. System multi-node packages are required to have AUTO_RUN set to YES. If a failover package has AUTO_RUN set to NO, Serviceguard cannot start it automatically at cluster startup time; you must explicitly enable this kind of package using the cmmodpkg command.
How does a failover package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13 “Package Time Line Showing Important Events”. The following are the most important moments in a package’s life:
First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node. Another type of resource is a dependency on a monitored external resource or on a special-purpose package. If monitoring shows a value for a configured resource that is outside the permitted range, the package cannot start. Once a node is selected, a check is then done to make sure the node allows the package to start on it. Then services are started up for a package by the control script on the selected node. Strictly speaking, the run script on the selected node is used to start the package. This section applies only to failover packages. Once the package manager has determined that the package can start on a particular node, it launches the run script (that is, a failover package’s control script that is executed with the ‘start’ parameter). This failover package script carries out the following steps (also shown in Figure 3-14 “Package Time Line for Run Script Execution”):
At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script. During run script execution, messages are written to a log file in the same directory as the run script. This log has the same name as the run script and the extension .log. Normal starts are recorded in the log, together with error messages or warnings related to starting the package.
Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully.
Within the package control script, the cmrunserv command starts up the individual services. This command is executed once for each service that is coded in the file. Each service has a number of restarts associated with it. The cmrunserv command passes this number to the package manager, which will restart the service the appropriate number of times if the service should fail. The following are some typical settings:
During the normal operation of cluster services, the package manager continuously monitors the following:
Some failures can result in a local switch. For example, if there is a failure on a specific LAN card and there is a standby LAN configured for that subnet, then the Network Manager will switch to the healthy LAN card. If a service fails but the RESTART parameter for that service is set to a value greater than 0, the service will restart, up to the configured number of restarts, without halting the package. If there is a configured EMS resource dependency and there is a trigger that causes an event, the package will be halted. During normal operation, while all services are running, you can see the status of the services in the “Script Parameters” section of the output of the cmviewcl command. What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, if a configured resource fails, or if a configured dependency on a special-purpose package is not met, then a failover package will halt on its current node and, depending on the setting of the package switching flags, may be restarted on another node. If a multi-node or system multi-node package fails, all of the packages that have configured a dependency on it will also fail. Package halting normally means that the package halt script executes (see the next section). However, if a failover package’s configuration has the SERVICE_FAILFAST_ENABLED flag set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script. If AUTO_RUN is set to YES, the package will start up on another eligible node, if it meets all the requirements for startup. If AUTO_RUN is set to NO, then the package simply halts without starting up anywhere else.
The Serviceguard cmhaltpkg command has the effect of executing the package halt script, which halts the services that are running for a specific package. This provides a graceful shutdown of the package that is followed by disabling automatic package startup (AUTO_RUN). You cannot halt a multi-node or system multi-node package unless all packages that have a configured dependency on it are down. Use cmviewcl to check the status of dependents. For example, if pkg1 and pkg2 depend on PKGa, both pkg1 and pkg2 must be halted before you can halt PKGa.
The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node. This section applies only to failover packages. Once the package manager has detected the failure of a service or package that a failover pacakge depends on, or when the cmhaltpkg command has been issued for a particular failover package, then the package manager launches the halt script. That is, the failover package’s control script executes the ‘halt’ parameter. This script carries out the following steps (also shown in Figure 3-15 “Package Time Line for Halt Script Execution”):
At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file in the same directory as the halt script. This log has the same name as the halt script and the extension.log. Normal starts are recorded in the log, together with error messages or warnings related to halting the package. The package’s ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes:
Table 3-4 “Error Conditions and Package Movement for Failover Packages” shows the possible combinations of error condition, failfast setting and package movement for failover packages. Table 3-4 Error Conditions and Package Movement for Failover Packages
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||