| United States-English |
|
|
|
![]() |
Configuring OPS Clusters with ServiceGuard OPS Edition > Chapter 3 Understanding the
Software Components of ServiceGuard OPS EditionHow the Package Manager Works |
|
Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator node is known as the package coordinator. The package coordinator does the following:
The package manager on all nodes does the following:
Two different types of packages can run in the cluster: the failover package and the system multi-node package. The system multi-node package is used only on systems that employ the VERITAS Cluster Volume Manager (CVM) as a storage manager. This package, known as VxVM-CVM-pkg, runs on all nodes that are active in the cluster and provides cluster membership information to the volume manager software. This type of package is configured and used only when you employ CVM for storage management. The process of creating the system multi-node package for CVM is described in Chapter 5. The rest of this section describes the standard failover packages. A failover package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package. Failover is shown in the following figure: Each package is separately configured. You create a package by using SAM or by editing a package ASCII configuration file (detailed instructions are given in Chapter 6). Then you use the cmapplyconf command to check and apply the package to the cluster configuration database. You also create the package control script, which manages the execution of the package's services. Then the package is ready to run. The package configuration file assigns a name to the package and identifies the nodes on which the package can run, in order of priority. It also indicates whether or not switching is enabled for the package, that is, whether the package should switch to another node or not in the case of a failure. In addition, the package failover and failback policies allow the package manager to decide dynamically where to start up a package. OPS instances are configured as packages with a single node in their node list. For conventional non-OPS instance packages, there may be one or many applications in a package. Package configuration is described in detail in the chapter "Configuring Packages and Their Services." The AUTO_RUN parameter (known in earlier versions of ServiceGuard as the PKG_SWITCHING_ENABLED parameter) defines the default global switching attribute for the package at cluster startup, that is, whether the package should be restarted automatically on a new node in response to a failure, and whether it should be started automatically when the cluster is started. Once the cluster is running, the package switching attribute of each package can be set with the cmmodpkg command. The parameter is coded in the package ASCII configuration file:
A package switch involves moving non-OPS packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With package failovers, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically. Note that if the package is dependent on multiple subnetworks, all subnetworks must be available on the target node before the package will be started. The switching of relocatable IP addresses is shown in Figure 3-5 “Before Package Switching” and Figure 3-6 “After Package Switching”. Figure 3-5 “Before Package Switching” shows a two node cluster in its original state with Package 1 running on Node 1 and Package 2 running on Node 2. OPS instances are running separately on both nodes. Users connect to the node with the IP address of the package they wish to use. Each node has a stationary IP address associated with it, and each package has an IP address associated with it. Figure 3-6 “After Package Switching” shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. OPS instance 1 is no longer operating, but it does not fail over to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package 1's disk and Package 2's disk. OPS instance 2 now handles all database access, since instance 1 has gone down. The Package Manager selects a node for a package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also coded in the file or set with SAM. Failover policy governs not only failover behavior but also startup behavior for the package, including the initial startup. The two failover policies are CONFIGURED_NODE and MIN_PACKAGE_NODE. The parameter is coded in the package ASCII configuration file:
If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the node list that is available. If you use MIN_PACKAGE_NODE as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.) Using the MIN_PACKAGE_NODE failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the package. Consider the following package configuration for a four node cluster. Note that all packages can run on all nodes and have the same NODE_NAME lists, though the node names appear in a different order in the package configuration files. Table 3-1 Package Configuration Data
When the cluster starts, each package starts as shown in Figure 3-7 “Rotating Standby Configuration before Failover”. If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8 “Rotating Standby Configuration after Failover”, which shows a failure on node 2: If these packages had been set up using the CONFIGURED_NODE failover policy, they would start initially as in Figure 3-7 “Rotating Standby Configuration before Failover”, but the failure of node 2 would cause the package to start on node 3, as in Figure 3-9 “CONFIGURED_NODE Policy Packages after Failover”: If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
The use of the FAILBACK_POLICY parameter allows you to decide whether a package will return to its primary node if the primary node becomes available and the package is not currently running on the primary node. The configured primary node is the first node listed in the package's node list. The two possible values for this policy are AUTOMATIC and MANUAL. The parameter is coded in the package ASCII configuration file:
As an example, consider the following four-node configuration, in which FAILOVER_POLICY is set to CONFIGURED_NODE and FAILBACK_POLICY is AUTOMATIC: Table 3-2 Package Configuration Data
Node1 panics, and after the cluster reforms, pkgA starts running on node 4: After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1.
Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package's running on a node where you did not expect it to run, since the node running the fewest packages will probably not be the same host every time a failover occurs. If you are using package configuration files that were generated using a previous version of MC/ServiceGuard, then the FAILOVER_POLICY will be the default package behavior of CONFIGURED_NODE and the FAILBACK_POLICY will be the default package behavior of MANUAL. If you wish to change these policies, edit the package configuration file to add the parameters, or use cmmakepkg to create a new package configuration file. Starting with the A.11.12 version of ServiceGuard OPS Edition, the PKG_SWITCHING_ENABLED parameter was renamed AUTO_RUN, and the NET_SWITCHING_ENABLED parameter was renamed to LOCAL_LAN_FAILOVER_ALLOWED. The older names will still work in your configuration files, but it is recommended to change the keywords. Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by ServiceGuard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for ServiceGuard. Monitors currently supplied with other software products include EMS High Availability Monitors, an OTS/9000 monitor, and an ATM monitor. If a registered resource is configured in a package, the package manager calls the resource registrar to launch an external monitor for the resource. Resources can be configured to start up either at the time the node enters the cluster or at the end of package startup. The monitor then sends messages back to ServiceGuard, which checks to see whether the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts. You can specify a registered resource for a package by selecting it from the list of available resources displayed in the SAM package configuration area. The size of the list displayed by SAM depends on which resource monitors have been registered on your system. Alternatively, you can obtain information about registered resources on your system by using the command /opt/resmon/bin/resls. For additional information, refer to the man page for resls(1m). The EMS HA Monitors, available as a separate product (B5736DA), can be used to set up monitoring of disks and other resources as package dependencies. Examples of resource attributes that can be monitored using EMS include the following:
Once a monitor is configured as a package dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node. The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification. Refer to the manual Using High Availability Monitors (B5736-90022) for additional information. To determine failover behavior, you can define a package failover policy that governs which nodes will automatically start up a package that is not running. In addition, you can define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible. The following table describes different types of failover behavior and the settings in SAM or in the ASCII package configuration file that determine each behavior. Table 3-3 Package Failover Behavior
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||