Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software: XC Installation Guide > Chapter 1 Preparing for a New Installation

Task 9: Plan a Service Availability Strategy

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

This task describes how to configure some services with improved availability. HP recommends that you read this entire section to learn what improved availability means, how it is achieved, and which services are eligible for improved availability.

Some advance planning is required if you want to configure improved availability of services. HP recommends that you take the time to perform planning exercises now, before proceeding any further to prepare the system and gather the information required by the cluster_config utility.

This section addresses the following topics:

What Is Improved Availability?

A service availability infrastructure is built into the HP XC System Software, which enables an availability tool to fail over a subset of services to nodes that have been designated as a second server of the service. Improved availability protects against service failure if a node that is serving vital services becomes unresponsive or goes down.

You have the flexibility to decide which availability tool you want to use to manage and migrate specified services to a second server if the first server is not available. You can install one or more availability tools to manage the services that have been configured for improved availability. The availability tool is responsible for starting, stopping, and monitoring those services; you do not use HP XC commands for those tasks.

HP Serviceguard product is the recommended availability tool, but you can install and configure other third party tools of your choice. If you install an availability tool other than HP Serviceguard, the third-party vendor or the product documentation is responsible for providing customer support for the tool.

How Does Availability Work?

Using the procedures described in this document, you use the cluster_config utility to assign a service or services to be served from two nodes through a mechanism called an availability set. An availability set associates two individual nodes so that one node acts as the first server and the other node acts as the second server of the service.

The cluster_config utility associates the service with an IP address alias (for example, 172.20.64.200). Client nodes are configured to connect to the service by communicating with the IP alias.

The availability tool is responsible for configuring the IP alias and for starting the service on the primary server. The tool is also responsible for detecting if a node becomes unresponsive or unavailable. If that happens, the availability tool uses the IP alias to start the service on the second server in the availability set. Client nodes do not detect that the first server has gone down.

How to Configure Improved Availability

Table 1-1 provides a summary of how to set up and configure improved availability of services. Detailed information or procedures are provided at the appropriate points in the system installation and configuration process, as noted in the second column of the table.

Table 1-1 Improved Availability Summary

TaskAppropriate Task Details Are Provided Here
  1. Decide which availability tool or tools you want to use; this tool manages the services for which improved availability has been configured. Then, obtain or purchase, install, and configure the availability tool or tools.

“Choosing an Availability Tool”

  1. Write translator and supporting scripts for the availability tool if you are not using HP Serviceguard.

    Then, position the translator and related scripts in the /opt/hptc/availability/availability_tool directory.

“Writing Translator and Other Supporting Scripts”

  1. Decide which nodes you want to configure into availability sets.

“Choosing Nodes as Members of Availability Sets”

  1. Determine the services for which you want to configure improved availability.

“Assigning Node Roles For Improved Availability”

  1. If you are using HP Serviceguard as the availability tool, decide whether to use a quorum server or a lock LUN to achieve quorum. You must create the lock LUN before running the cluster_config utility.

“Deciding on the Method to Achieve Quorum for HP Serviceguard Clusters”

  1. Initiate the cluster_config utility to configure the HP XC system. You are prompted to associate nodes into availability sets.

“Task 8: Configure Availability Sets”

  1. Use the [M]odify Nodes option of the cluster_config utility to assign the appropriate roles (and thus, the appropriate services) to the nodes in the availability sets. Both nodes in the availability set must be assigned with the same roles.

“Task 9: Modify and Assign Node Roles”

  1. Start the availability tool with the transfer_to_avail command when cluster_config processing is complete and the HP XC system is booted.

“Task 15: Start Availability Tools”

 

Choosing an Availability Tool

The first step in planning your service availability strategy is to decide on the availability tool or tools you want to use to manage services that have been configured with improved availability. Many different availability tools are on the market, and they all provide the same basic functionality. The availability infrastructure of the HP XC System Software enables you to install more than one availability tool.

Regardless of the tool you choose, you install the availability tool software on the HP XC system after you install the HP XC System Software on the head node. This document is designed sequentially, and you are instructed to install the availability tool software at the appropriate time (“Install Additional Software From HP or Third-Party Vendors”).

HP Serviceguard

In this release, the HP Serviceguard product is the recommended availability tool. HP Serviceguard enables system services to continue if a hardware or software failure occurs. You must purchase and license Serviceguard separately from the HP XC System Software product.

After purchasing HP Serviceguard, you install it from the Serviceguard distribution media. This document instructs you to install a specific HP XC Serviceguard RPM® from the HP XC System Software distribution DVD. This RPM enables Serviceguard to operate on an HP XC system.

Availability Tools from Other Vendors

If you prefer to use another availability tool, such as Heartbeat Version 1 or Version 2 (which is an open source tool), you must obtain the tool and configure it for use on your own. Third-party vendors are responsible for providing customer support for their tools.

Installation and configuration instructions for any third-party availability tools you decide to use are outside the scope of this document. See the vendor documentation for instructions.

Writing Translator and Other Supporting Scripts

Translator scripts parse the HP XC configuration and management database (CMDB), the database that stores HP XC configuration data. The translator scripts use the information in the database to create the files necessary for the availability tool to function properly on an HP XC system. Specifically, a translator script is responsible for gathering IP alias and daemon information and creating the necessary files to ensure that a node that serves a service advertises the IP alias and starts the daemon.

Because HP Serviceguard is the recommended availability tool, translator scripts and other supporting scripts are already provided for you in the HP XC Serviceguard RPM. You do not have to write scripts if you are using Serviceguard.

Availability Tools from Other Vendors

If you are using an availability tool other than Serviceguard, you are responsible for completing the following tasks:

  1. You must write translator scripts and any other required scripts. See the vendor documentation or contact the HP XC Support team at for information about how to write these scripts.

  2. When the scripts are ready, you must copy or move the scripts to the /opt/hptc/availability/availability_tool directory for use by the cluster_config utility.

When the cluster_config utility finds translator scripts in the /opt/hptc/availability/availability_tool directory, you are prompted to specify the nodes that you want to associate as members in an availability set, and you select the availability tool you want to use to manage the availability set.

Choosing Nodes as Members of Availability Sets

Use the following guidelines to decide which nodes to associate into availability sets:

  • If you want to configure the database server with improved availability, you must create one availability set, containing the head node and one other node, to serve the dbserver service. The dbserver service is served by the head node by default and cannot be moved.

  • When the head node is a member of an availability set, HP recommends that the second member in the availability set is the same hardware model type as the head node.

  • The head node can be a member of only one availability set per availability tool.

  • If you intend to configure nat or LVS as services for improved availability, the nodes you associate in the availability set must have an external Ethernet connection configured as well. You do this using the cluster_config utility.

  • You cannot overlap nodes in availability sets. A node can be a member in only one availability set per availability tool. For instance, if nodes n8 and n7 are associated into an availability set under Serviceguard, neither node n8 nor n7 can be a member of another availability set being managed by Serviceguard.

    However, nodes n8 or n7 or both can be members of another availability set if it is managed by an availability tool other than Serviceguard, for example HeartBeat Version 1 or Version 2.

Assigning Node Roles For Improved Availability

An important part of planning your strategy for improved availability is to determine the services for which availability is vital to the system operation. Services are delivered in node roles. A node role is an abstraction that combines one or more services into a group and provides a convenient way of installing services on a node.

In this release, improved availability is supported for the services listed in Table 1-2. Also listed in the table are things to consider about the role assignments if you plan to implement improved availability for one or more of these services.

Read Appendix F to learn more about default role assignments and the full set of services provided by each role.

Table 1-2  Role and Service Placement for Improved Availability

Service NameService is Delivered in This RoleSpecial Considerations for Role Assignment
Database server (dbserver)avail_node_management

The dbserver service is present on the head node by default and cannot be moved. Thus, to achieve improved availability of the dbserver service, the following is required:

  • You must install and configure the HP StorageWorks Scalable File Share (SFS) software, which must be purchased separately from HP.

    When improved availability is enabled, the database tables are moved to the system-wide /hptc_cluster file system (rather than the default location in the /var/lib/mysql directory). In order for /hptc_cluster to be highly available, it must reside on an SFS server.

  • You must configure the head node into an availability set with one additional node.

  • You must assign the avail_node_management role to the additional node.

For more information about the avail_node_management role, see “Avail_node_management Role”.

Linux Virtual Server (LVS) director login

The login role supplies the LVS director service. LVS requires the login role on three nodes to attain improved availability:

  • One node which is always a real server

  • A pair of nodes to act as the LVS director and a backup for the LVS director associated into an availability set. The LVS director also acts as a real server; the backup never does.

This means that to have n real servers during normal operation, you assign the login role to n+1 nodes because the backup for the LVS director is never a real server, but it must have a login role to act as a backup. When failover occurs, the backup takes over the director role, but it does not become a real server. This means that when the LVS director service is failed over, there is one less real server available (temporarily) because the director, which was a real server, is unavailable.

  

Within the availability set, the higher numbered node is the LVS director, and the lower numbered node is the backup for the LVS director.

Thus, to achieve improved availability of the LVS director service, you must assign at least three nodes with the login role:

  • Assign the login to the first node in the availability set.

  • Assign the login role to the second node in the availability set.

  • Assign the login role to any other node in the system. This node acts as a real server.

The nodes you associate into the availability set must have an external Ethernet connection configured as well.

For more information about the login role, see “Login Role”.

Nagios mastermanagement_server

By default, the management_server role is installed on the head node. If you want improved availability for Nagios, the management_server role must be assigned to two nodes, the head node and one additional node.

In this case, the head node cannot have the management_hub and console_network roles assigned to it, so you must move those roles to the other node in the availability set.

The other node in the availability set acts as a Nagios monitor unless the Nagios master fails over; at that time the other node acts both as a Nagios master and a Nagios monitor.

HP recommends that the other node in the availability set also has an external Ethernet connection so that you can run the Nagios web interface on it.

For more information about the management_server role, see “Management Server Role” .

Network Address Translation (NAT)external

To achieve improved availability of NAT, you must assign the external role to both nodes in the availability set, and both nodes must have a configured external Ethernet connection. If you assign the external role to any other node that is not part of an availability set, that node cannot act as a NAT server because it cannot be managed by the availability tool.

During cluster_config processing, you are prompted to supply the IP addresses of the NAT servers.

For more information about the external role See “External Role”

 

Configuring Improved Availability for the /hptc_cluster File System

To configure improved availability for the /hptc_cluster file system (service name hptc_cluster_fs), use the HP StorageWorks Scalable File Share (SFS) software, which must be purchased separately from HP. SFS is also required for successful fail over of the dbserver service. During the HP XC Kickstart installation procedure, you are prompted to configure the /hptc_cluster file system on a disk somewhere other than the head node. If you purchase and configure SFS, you can locate the file system on SFS storage.

Configuring Failover Capabilities for SLURM and LSF-HPC with SLURM

Improved availability for SLURM and LSF-HPC with SLURM is not achieved through availability sets or availability tools. Failover capabilities for SLURM and LSF-HPC with SLURM are achieved by placing the resource_management role on two or more nodes. These nodes are not members of any availability set, and the SLURM and LSF-HPC with SLURM software is not managed by any availability tool.

When you assign two or more nodes with the resource_management role, SLURM availability is automatically enabled. If you assign the resource_management to two or more nodes, you must manually enable availability for LSF-HPC with SLURM; see “Perform LSF Postconfiguration Tasks” for instructions.

Standard LSF also contains it's own automatic failover mechanisms. See the Platform LSF documentation for more information on node failure scenarios with standard LSF.

Using the Improved Availability Planning Worksheet

After you have completed the advance planning of your service availability strategy, use the worksheet in Table 1-3 to record the following information:

  • The node names to associate into availability sets.

  • The availability tool that will manage the services in each availability set (if you installed and configured more than one availability tool).

  • The roles (and thus, the services) to assign to both nodes in each availability set

The cluster_config utility prompts you for this information, so have the worksheet handy.

Table 1-3 Availability Sets Worksheet

Availability Set Configuration 
First Node NameSecond Node NameAvailability Tool to Manage This Availability SetRoles to Assign to Nodes in the Availability Set
   

First node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

Second node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

   

First node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

Second node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

   

First node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

Second node in the availability set:

  • _________________________

  • _________________________

  • _________________________

  • _________________________

  • _________________________

 

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003–2007 Hewlett-Packard Development Company, L.P.