| United States-English |
|
|
|
![]() |
HP XC System Software: Installation Guide > Chapter 1 Preparing for a New InstallationTask 9: Plan a Service Availability Strategy |
|
This task describes how to configure some services with improved availability. HP recommends that you read this entire section to learn what improved availability means, how it is achieved, and which services are eligible for improved availability. Some advance planning is required if you want to configure improved availability of services. HP recommends that you take the time to perform planning exercises now, before proceeding any further to prepare the system and gather the information required by the cluster_config utility. This section addresses the following topics: A service availability infrastructure is built into the HP XC System Software, which enables an availability tool to fail over a subset of services to nodes that have been designated as a second server of the service. Improved availability protects against service failure if a node that is serving vital services becomes unresponsive or goes down. You have the flexibility to decide which availability tool you want to use to manage and migrate specified services to a second server if the first server is not available. You can install one or more availability tools to manage the services that have been configured for improved availability. The availability tool is responsible for starting, stopping, and monitoring those services; you do not use HP XC commands for those tasks. HP Serviceguard product is the recommended availability tool, but you can install and configure other third party tools of your choice. If you install an availability tool other than HP Serviceguard, the third-party vendor or the product documentation is responsible for providing customer support for the tool. How Does Availability Work?Using the procedures described in this document, you use the cluster_config utility to assign a service or services to be served from two nodes through a mechanism called an availability set. An availability set associates two individual nodes so that one node acts as the first server and the other node acts as the second server of the service. The cluster_config utility associates the service with an IP address alias (for example, 172.20.64.200). Client nodes are configured to connect to the service by communicating with the IP alias. The availability tool is responsible for configuring the IP alias and for starting the service on the primary server. The tool is also responsible for detecting if a node becomes unresponsive or unavailable. If that happens, the availability tool uses the IP alias to start the service on the second server in the availability set. Client nodes do not detect that the first server has gone down. Table 1-1 provides a summary of how to set up and configure improved availability of services. Detailed information or procedures are provided at the appropriate points in the system installation and configuration process, as noted in the second column of the table. Table 1-1 Improved Availability Summary
The first step in planning your service availability strategy is to decide on the availability tool or tools you want to use to manage services that have been configured with improved availability. Many different availability tools are on the market, and they all provide the same basic functionality. The availability infrastructure of the HP XC System Software enables you to install more than one availability tool. Regardless of the tool you choose, you install the availability tool software on the HP XC system after you install the HP XC System Software on the head node. This document is designed sequentially, and you are instructed to install the availability tool software at the appropriate time (“Install Additional Software From HP or Third-Party Vendors”). HP ServiceguardIn this release, the HP Serviceguard product is the recommended availability tool. HP Serviceguard enables system services to continue if a hardware or software failure occurs. You must purchase and license Serviceguard separately from the HP XC System Software product. After purchasing HP Serviceguard, you install it from the Serviceguard distribution media. This document instructs you to install a specific HP XC Serviceguard RPM® from the HP XC System Software distribution DVD. This RPM enables Serviceguard to operate on an HP XC system. Availability Tools from Other VendorsIf you prefer to use another availability tool, such as Heartbeat Version 1 or Version 2 (which is an open source tool), you must obtain the tool and configure it for use on your own. Third-party vendors are responsible for providing customer support for their tools. Installation and configuration instructions for any third-party availability tools you decide to use are outside the scope of this document. See the vendor documentation for instructions. Translator scripts parse the HP XC configuration and management database (CMDB), the database that stores HP XC configuration data. The translator scripts use the information in the database to create the files necessary for the availability tool to function properly on an HP XC system. Specifically, a translator script is responsible for gathering IP alias and daemon information and creating the necessary files to ensure that a node that serves a service advertises the IP alias and starts the daemon. Because HP Serviceguard is the recommended availability tool, translator scripts and other supporting scripts are already provided for you in the HP XC Serviceguard RPM. You do not have to write scripts if you are using Serviceguard. Availability Tools from Other VendorsIf you are using an availability tool other than Serviceguard, you are responsible for completing the following tasks:
When the cluster_config utility finds translator scripts in the /opt/hptc/availability/availability_tool directory, you are prompted to specify the nodes that you want to associate as members in an availability set, and you select the availability tool you want to use to manage the availability set. Use the following guidelines to decide which nodes to associate into availability sets:
An important part of planning your strategy for improved availability is to determine the services for which availability is vital to the system operation. Services are delivered in node roles. A node role is an abstraction that combines one or more services into a group and provides a convenient way of installing services on a node. In this release, improved availability is supported for the services listed in Table 1-2. Also listed in the table are things to consider about the role assignments if you plan to implement improved availability for one or more of these services. Read Appendix F to learn more about default role assignments and the full set of services provided by each role. Table 1-2 Role and Service Placement for Improved Availability
Configuring Improved Availability for the /hptc_cluster File SystemTo configure improved availability for the /hptc_cluster file system (service name hptc_cluster_fs), use the HP StorageWorks Scalable File Share (SFS) software, which must be purchased separately from HP. SFS is also required for successful fail over of the dbserver service. During the HP XC Kickstart installation procedure, you are prompted to configure the /hptc_cluster file system on a disk somewhere other than the head node. If you purchase and configure SFS, you can locate the file system on SFS storage. Configuring Failover Capabilities for SLURM and LSF-HPC with SLURMImproved availability for SLURM and LSF-HPC with SLURM is not achieved through availability sets or availability tools. Failover capabilities for SLURM and LSF-HPC with SLURM are achieved by placing the resource_management role on two or more nodes. These nodes are not members of any availability set, and the SLURM and LSF-HPC with SLURM software is not managed by any availability tool. When you assign two or more nodes with the resource_management role, SLURM availability is automatically enabled. If you assign the resource_management to two or more nodes, you must manually enable availability for LSF-HPC with SLURM; see “Perform LSF Postconfiguration Tasks” for instructions. Standard LSF also contains it's own automatic failover mechanisms. See the Platform LSF documentation for more information on node failure scenarios with standard LSF. After you have completed the advance planning of your service availability strategy, use the worksheet in Table 1-3 to record the following information:
The cluster_config utility prompts you for this information, so have the worksheet handy. Table 1-3 Availability Sets Worksheet
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||