Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters: > Chapter 2 Designing a Continental Cluster

Designing a Disaster Tolerant Architecture for use with Continentalclusters

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

A recovery pair in a continental cluster consists of two Serviceguard clusters. One functions as a primary cluster and the other functions as recovery cluster for a specific application. Prior to Continentalclusters version A.05.00, one recovery pair can be configured in a continental cluster. Starting with Continentalclusters version A.05.00, a configuration of multiple recovery pairs is allowed.

In the multiple recovery pair configuration, more than one primary cluster (where the primary packages are running) can be configured to share the same recovery cluster (where the recovery package is running).

The key elements providing disaster tolerance in a continental cluster recovery pair are:

  • Mutual Recovery

  • Serviceguard clusters

  • Data replication

  • Highly available WAN networking

  • Data center processes and procedures coordinated between the two cluster sites

There is significant amount of latitude in selecting these elements for a configuration. It is recommended the choices are recorded on worksheets which can be reviewed and updated periodically.

Mutual Recovery

For mutual recovery, any cluster in a continental cluster recovery pair may contain both primary and recovery packages for any recovery group. Recovery groups may be defined, for example, such that cluster A and cluster B contain recovery packages. In this case, cmrecovercl could be run on cluster B to recover packages from cluster A, or on cluster A to recover packages from cluster B.

Serviceguard Clusters

Each Serviceguard cluster in a continental cluster provides high availability for an application at the local level at that particular site. For optimal performance and to assure adequate capacity on the recovery cluster, it is best to have similar hardware on both clusters. For example, if one cluster contains two systems with 1Gb of memory each, it is not a good idea to have a low-end system with 128 Mb of memory in the other cluster. Each cluster may have as many nodes as are permitted in an ordinary Serviceguard cluster, and each may be running packages that are not configured to fail over between clusters.

NOTE: Take note when cluster A takes over for cluster B, it must run cluster B’s packages as well as any packages that it was already running on its own, unless those packages are stopped intentionally.

Data Replication

Data replication between the Serviceguard clusters in a Continentalclusters recovery pair extends the scope of high availability to the level of the continental cluster. Select a technology for data replication between the two clusters. There are many possible choices, including:

  • Logical replication of databases

  • Logical replication of file systems

  • Physical replication of data volumes via software

  • Physical replication of disk units via hardware

Table 2-3 “Data Replication and Continentalclusters” is a brief discussion of how a data replication method affects a continental cluster environment. A detailed description of data replication can be found in Chapter 1, in the section titled “Disaster Tolerance and Recovery in a Serviceguard Cluster.”

Specific guidelines for configuring the HP StorageWorks Disk Array XP Series, HP StorageWorks Disk Array EVA Series and the EMC Symmetrix Disk Array for physical data replication in a continental cluster are provided in Chapters 3, 4 and 5. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately.

White papers describing specific implementations are also available at

www.docs.hp.com -> High Availability

If a data replication technology is chosen that is not mentioned above, and if the integration is performed independently, then it is necessary to use the guidelines described in section, “Using the Recovery Command to Switch All Packages”. In that case, note the following:

  • Continentalclusters product is only responsible for the following: Continentalclusters configuration and management commands, the monitoring of remote cluster status, and the notification of remote cluster events.

  • Continentalclusters product provides a single recovery command to start all recovery packages that are configured in the Continentalclusters configuration file. These recovery packages are typical Serviceguard's packages. Continentalclusters recovery command does not do any checking on the status of the devices and data that are used by the application prior to starting the recovery package. The user is responsible for checking the state of the devices and the data before executing Continentalclusters recovery command.

Table 2-3 Data Replication and Continentalclusters

Replication Type

How it Works

Continentalclusters Implication

Logical Database Replication

Transactions from the primary application are applied from logs to a copy of the application running on the recovery site. (This is an example only; there are other methods.)

Requirements on CPU and I/O may limit or prevent the Recovery Cluster from running additional applications.

Logical Filesystem Replication

Writes to the filesystem on the primary cluster and are duplicated periodically on the recovery cluster.

CPU issues are the same as for Logical Database Replication. The software may have to be managed as a separate Serviceguard package.

Physical Replication of Data Volumes via Software

Disk mirroring via LVM software. Mirroring is done on disk links (SCSI or FibreChannel).

Requirements on CPU are less than for logical replication, but there is still some CPU use. Distance limits may make this type of replication inappropriate for Continentalclusters.

Physical Replication of Disk Units via Hardware

Replication of the LUNs across disk arrays through dedicated hardware links such as EMC SRDF or Continuous Access XP or Continuous Access EVA.

Limited CPU requirements, but the requirement of synchronous data replication slows replication, and may impair application performance. Increased network speed and bandwidth can remedy this.

 

Logical data replication may require the use of packages to handle software processes that copy data from one cluster to another or that apply transactions from logs that are copied from one cluster to another. Some methods of logical data replication may use a logical replication data sender package, and others may use a logical replication data receiver package while some may use both. Logical replication data sender and receiver packages are configured as part of the data recovery group, as shown in section, “Preparing the Clusters”.

Physical Data Replication using Special Environment files

For physical data replication Continentalclusters uses pre-integrated solutions, which uses Continuous Access XP, Continuous Access EVA and EMC SRDF. In order to use these data replication solutions in a Continentalclusters environment it is necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately.

Physical data replication generally does not require the use of separate sender or receiver packages, but it does require specialized logic in the package control scripts to handle the transfer of control from the storage units of one cluster to the storage units at the other cluster.

The packages that use physical data replication with the HP StorageWorks Disk Array XP Series with Continuous Access XP should have created a specific environment file using template /opt/cmcluster/toolkit/SGCA/xpca.env

For packages that are using physical data replication with HP StorageWorks Disk Array EVA with Continuous Access EVA should be created using /opt/cmcluster/toolkit/SGCA/caeva.env, and for packages that are using physical data replication with EMC Symmetrix and the SRDF facility should be created using /opt/cmcluster/toolkit/SGSRDF/srdf.env.

These templates can be purchased separately with the products Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF.

Details on configuring the special Continentalclusters control scripts are in Chapters 3, 4 and 5. Some additional notes are provided below.

Multiple Recovery Pairs in a Continental Cluster

One or more than one recovery pair can be configured in a continental cluster. In the Continentalclusters configuration that contains more than one recovery pair, more than one primary cluster is configured to share a common recovery cluster. Similar to the one recovery pair per continental cluster configuration, mutual recovery can also be configured in a multiple recovery pair scenario, as shown in Figure 2-4 “Multiple Recovery Pair Configuration in a Continental Cluster”. The common recovery cluster can choose any one of the primary clusters as its recovery cluster.

Data replication needs to be setup to allow for copying data from each primary cluster to the common recovery cluster. Each recovery pair should have its own data replication link. Different storage areas need to be configured with the common recovery cluster to receive data replicated from each primary clusters. The common recovery cluster should have enough capacity to serve the recovery purpose for all of the primary clusters configured to partner with it in a recovery pair.

Figure 2-4 Multiple Recovery Pair Configuration in a Continental Cluster

Multiple Recovery Pair Configuration in a Continental Cluster

Highly Available Wide Area Networking

Disaster tolerant networking for Continentalclusters is directly tied to the data replication method. In addition to the reliability of the redundant lines connecting the remote nodes, it is important to consider what bandwidth is needed to support the data replication method that has been chosen.

A continental cluster that handles a high number of write transactions per minute will not only require a highly available network, but also one with a large amount of bandwidth. Details on highly available networking can be found in Chapter 1, in the section titled “Disaster Tolerant Architecture Guidelines.” White papers describing specific implementations are also available at: www.docs.hp.com -> High Availability -> Continentalcluster or Metrocluster -> White Papers

Data Center Processes

Continentalclusters provides the cmrecovercl command that fails over all applications on the primary cluster in a recovery pair that are protected by Continentalclusters. However, application failover also requires well-defined processes for the two sites of a recovery pair. These processes and procedures should be written down and made available at both sites.

Some considerations for site management are as follows:

  • Who notifies whom for the various events: configuration changes, alerts, alarms?

  • What communication methods should be used? Email? Phone? Beeper? Multiple methods?

  • Who has the authority to perform what sort of configuration modifications? Can the administrator at one site log in to the nodes on the remote site? If so, what permissions would be set?

  • How often is a practice failover done?

  • Is there a documented test plan?

  • What is the process for tracking changes made to the primary cluster?

Continentalclusters Worksheets

Planning is an essential effort in creating a robust continental cluster environment. It is recommended to record the details of your configuration on planning worksheets. These worksheets can be filled in partially before configuration begins, and then completed as you build the continental cluster. All the participating Serviceguard clusters in one continental cluster should have a copy of these worksheets to help coordinate initial configuration and subsequent changes. Complete the worksheets in the following sections for each recovery pair of clusters that will be monitored by the Continentalclusters monitor.

Data Center Worksheet

The following worksheet will help you describe your specific data center configuration. Fill out the worksheet and keep it for future reference.

    =======================================================================

    Continental Cluster Name: _____________________________________________

    =======================================================================

    Primary Data Center Information:

         Primary Cluster Name: ____________________________________________

         Data Center Name and Location: ___________________________________

         Main Contact: ____________________________________________________

         Phone Number: ____________________________________________________

         Beeper: __________________________________________________________

         Email Address: ___________________________________________________

         Node Names: ______________________________________________________

         Monitor Package Name: __ccmonpkg__________________________________

         Monitor Interval: ________________________________________________

    =======================================================================

    Recovery Data Center Information:

         Recovery Cluster Name: ___________________________________________

         Data Center Name and Location: ___________________________________

         Main Contact: ____________________________________________________

         Phone Number: ____________________________________________________

         Beeper: __________________________________________________________

         Email Address: ___________________________________________________

         Node Names: ______________________________________________________

         Monitor Package Name: __ccmonpkg__________________________________

         Monitor Interval: _________________________________________________

Recovery Group Worksheet

The following worksheet will help you organize and record your specific recovery groups. Fill out the worksheet and keep it for future reference.

    =======================================================================

    Continental Cluster Name: _____________________________________________

    =======================================================================

    Recovery Group Data:

         Recovery Group Name: _____________________________________________

         Primary Cluster/Package Name:_____________________________________

         Data Sender Cluster/Package Name:_________________________________

         Recovery Cluster/Package Name:____________________________________

         Data Receiver Cluster/Package Name:_______________________________

    Recovery Group Data:

         Recovery Group Name: _____________________________________________

         Primary Cluster/Package Name:_____________________________________

         Data Sender Cluster/Package Name:_________________________________

         Recovery Cluster/Package Name:____________________________________

         Data Receiver Cluster/Package Name:_______________________________

    Recovery Group Data:

         Recovery Group Name: _____________________________________________

         Primary Cluster/Package Name:_____________________________________

         Data Sender Cluster/Package Name:_________________________________

         Recovery Cluster/Package Name:____________________________________

         Data Receiver Cluster/Package Name:_______________________________

Cluster Event Worksheet

The following worksheet will help you organize and record the cluster events you wish to track. Fill out a worksheet for each primary or recovery cluster that you wish to monitor. You must monitor each cluster containing a primary package which needs to be recovered.


    Continental Cluster Name: _____________________________________________

    =======================================================================

    Cluster Event Information:

         Cluster Name _____________________________________________________

         Monitoring Cluster: ______________________________________________

         UNREACHABLE:

         Alert Interval:___________________________________________________

         Alarm Interval:___________________________________________________

         Notification:_____________________________________________________

         Notification:_____________________________________________________

         Notification:_____________________________________________________


         DOWN:

         Alert Interval:___________________________________________________

         Notification:_____________________________________________________

         Notification:_____________________________________________________

         UP:

         Alert Interval:___________________________________________________

         Notification:_____________________________________________________

         Notification:_____________________________________________________

         ERROR:

         Alert Interval:___________________________________________________

         Notification:_____________________________________________________

         Notification:_____________________________________________________
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.