| United States-English |
|
|
|
![]() |
Configuring OPS Clusters with ServiceGuard OPS Edition > Chapter 3 Understanding the
Software Components of ServiceGuard OPS EditionHow Package Control Scripts Work |
|
Packages are the means by which ServiceGuard starts and halts configured applications. Packages are also units of failover behavior in ServiceGuard. A package is a collection of services, disk volumes and IP addresses that are managed by ServiceGuard to ensure they are available. There can be a maximum of 60 packages per cluster and a total of 900 services per cluster. A package starts up when it is not currently running, and the package manager senses that it has been enabled on an eligible node in the cluster. If there are several nodes on which the package is enabled, the package manager will use the failover policy to determine where to start the package. Note that you do not necessarily have to use a cmrunpkg command. In many cases, a cmmodpkg command that enables the package on one or more nodes is the best way to start the package. The package manager will always try to keep the package running unless there is something preventing it from running on any node. The most common reasons for a package not being able to run are that AUTO_RUN is disabled, or NODE_SWITCHING is disabled for the package on particular nodes. When a package has failed on one node and is enabled on another node, it will start up automatically in the new location. This process is known as package switching, also known as remote switching. When you create the package, you indicate the list of nodes on which it is allowed to run. A standard package can run on only one node at a time, and it runs on the next available node in the node list. A package can start up automatically at cluster startup time if the AUTO_RUN parameter is set to YES. Conversely, a package with AUTO_RUN set to NO will not start automatically at cluster startup time; you must explicitly enable this kind of package using a cmmodpkg command.
How does the package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13 “Package Time Line Showing Important Events”. The following are the most important moments in a package's life:
First, a node is selected. This node must be in the package's node list, it must conform to the package's failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node. Another type of resource is a dependency on a monitored external resource. If monitoring shows a value for a configured resource that is outside the permitted range, the package cannot start. Once a node is selected, a check is then done to make sure the node allows the package to start on it. Then services are started up for a package by the control script on the selected node. Strictly speaking, the run script on the selected node is used to start the package. Once the package manager has determined that the package can start on a particular node, it launches the run script (that is, the control script executed with the 'start' parameter. This script carries out the following steps (also shown in Figure 3-14 “Package Time Line for Run Script Execution”):
At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script. During run script execution, messages are written to a log file in the same directory as the run script. This log has the same name as the run script and the extension .log. Normal starts are recorded in the log, together with error messages or warnings related to starting the package.
Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully.
Within the package control script, the cmrunserv command starts up the individual services. This command is executed once for each service that is coded in the file. Each service has a number of restarts associated with it. The cmrunserv command passes this number to the package manager, which will restart the service the appropriate number of times if the service should fail. The following are some typical settings:
During the normal operation of cluster services, the package manager continuously monitors the following:
Some failures can result in a local switch. For example, if there is a failure on a specific LAN card and there is a standby LAN configured for that subnet, then the Network Manager will switch to the healthy LAN card. If a service fails but the RESTART parameter for that service is set to a value greater than 0, the service will restart, up to the configured number of restarts, without halting the package. If there is a configured EMS resource dependency and there is a trigger that causes an event, the package will be halted. During normal operation, while all services are running, you can see the status of the services in the "Script Parameters" section of the output of the cmviewcl command. What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, or if a configured resource fails, then the package will halt on its current node and, depending on the setting of the package switching flags, may be restarted on another node. Package halting normally means that the package halt script executes (see the next section). However, if SERVICE_FAILFAST_ENABLED is set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script. If AUTO_RUN is set to YES, the package will start up on another eligible node, if it meets all the requirements for startup. If AUTO_RUN is set to NO, then the package simply halts without starting up anywhere else. The ServiceGuard cmhaltpkg command has the effect of executing the package halt script, which halts the services that are running for a specific package. This provides a graceful shutdown of the package that is followed by disabling switching for the package on all nodes.
The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node. Once the package manager has detected a service failure, or when the cmhaltpkg command has been issued for a particular package, then it launches the halt script (that is, the control script executed with the 'halt' parameter. This script carries out the following steps (also shown in Figure 3-15 “Package Time Line for Halt Script Execution”):
At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file in the same directory as the halt script. This log has the same name as the halt script and the extension .log. Normal starts are recorded in the log, together with error messages or warnings related to halting the package. The package's ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes:
Table 3-4 “Error Conditions and Package Movement” shows the possible combinations of error condition, failfast setting and package movement for failover packages. Table 3-4 Error Conditions and Package Movement
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||