| United States-English |
|
|
|
![]() |
Designing Disaster Tolerant High Availability Clusters: > Chapter 3 Building a Metropolitan Cluster Using
MetroCluster/CADesigning a Disaster Tolerant Architecture for use with MetroCluster/CA |
|
MetroCluster/CA is designed for use in an extended distance cluster or metropolitan cluster environment within the 100 km limit of the FDDI network. All nodes must be members of a single MC/ServiceGuard cluster. Two configurations are supported:
Following are the disaster tolerant architecture requirements:
A single data center architecture is supported, but it is not a true disaster tolerant architecture. If the entire data center fails, there will be no automated failover. This architecture is only valid for protecting data through data replication, and for protecting against multiple node failures. This is the recommended and supported disaster tolerant architecture for use with MetroCluster/CA. The three data center architecture consists of two data centers with an equal number of nodes and a third data center with one or more arbitrator nodes; see Figure 3-1 “Three Data Centers with Arbitrators”. The local XP Series disk array is called the Main Control Unit (MCU) for all nodes and packages in a given data center. The remote XP Series disk array, where the data is replicated, is called the Remote Control Unit (RCU). Note that main and remote are relative terms. An XP Series disk array can be the main disk array for one set of packages and the remote disk array for another. In Figure 3-1 “Three Data Centers with Arbitrators”, the XP disk array in data center A is the main or primary disk array for packages A and B, and the remote or secondary disk array for packages C and D in data center B. For packages A and B, data is written to PVOLs on the the array in Data Center A and replicated to SVOLs on the array in Data Center B. Likewise the XP disk array in Data Center B is the primary or main disk array for packages C and D, and the secondary or remote for packages A and B. For packages C and D, data is written to PVOLs on the disk array in Data Center B and replicated to SVOLs in Data Center A. Arbitrators provide functionality like that of the cluster lock disk, and act as tie-breakers for a cluster quorum in case all of the nodes in one data center go down at the same time. Cluster lock devices are not used in the three-data-center architecture because cluster locks cannot be maintained across the CA link. Arbitrators are fully functioning systems that are members of the cluster, and are not usually physically connected to the XP disk arrays. Table 3-2 “Supported System and Data Center Combinations” lists the allowable number of nodes at each data center in a three data center configuration, up to a 16-node maximum cluster size. (Note that the maximum cluster size for MC/ServiceGuard A.10.10 and A.10.11 is 8 nodes.) Table 3-2 Supported System and Data Center Combinations
* Configurations with two arbitrators are preferred because they provide a greater degree of availability, especially in cases when a node is down due to a failure or planned maintenance.
Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure. Using two arbitrators:
If you use a single arbitrator system, special procedures must be followed during planned downtime to remain protected. Systems must be taken down in pairs, one from each of the data centers, so that the MC/ServiceGuard quorum is maintained after a node failure. If the arbitrator itself must be taken down, disaster recovery capability is at risk if one of the other systems fails. Arbitrator systems can be used to perform important and useful work such as:
Each XP Series disk array must be configured with redundant CA links, each of which is connected to a different LCP or RCP card. To prevent a single point of failure (SPOF), there must be at least two physical boards in each XP for the CA links. Each board usually has multiple ports. However, a redundant CA link must be connected to a port on a different physical board from the board that has the primary CA link. When using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four CA links, two in each direction. Four CA links are also required in uni-directional configurations in which you want to allow failback. When a cluster initially forms, all systems must be available to form the cluster (100% Quorum requirement). A quorum is dynamic and is recomputed after each system failure. For instance, if you start out with an 8-node cluster and two systems fail, that leaves 6 out 8 surviving nodes, or a 75% quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is 67%. Each time a cluster forms, there must be more than 50% quorum to reform the cluster. A cluster lock disk is normally used as the tie-breaker when quorum is exactly 50%. However, a cluster lock disk is not supported with MetroCluster with Continuous Access XP. Therefore, a quorum of 50% or less will cause the remaining nodes to halt. Taking a node off-line for planned maintenance is treated the same as a node failure in these scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned maintenance. The scenarios in Table 3-3 “Node Failure Scenarios with One Arbitrator Fm Variable:Table Continuation”, based on Figure 3-2 “Failover Scenario with a Single Arbitrator”, illustrate possible results if one or more nodes fail in a configuration with a single arbitrator. Table 3-3 Node Failure Scenarios with One Arbitrator Fm Variable:Table Continuation
* Cluster can be manually started with the remaining node. Table 3-4 “Data Center Failure Scenarios with One Arbitrator” illustrates possible results if a data center fails in a configuration with a single arbitrator. Table 3-4 Data Center Failure Scenarios with One Arbitrator
* Cluster can be manually started with the remaining node. With a single arbitrator node, the cluster is at risk each time a node fails or comes down for planned maintenance. Having two arbitrator nodes adds extra protection during node failures and allows you to do planned maintenance on arbitrator nodes without losing the cluster should a disaster occur. The scenarios in Table 3-5 “Failure Scenarios with Two Arbitrators” illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators. Note that 3 of the 4 scenarios that caused a cluster halt with a single arbitrator, do not cause a cluster halt with two arbitrators. Table 3-5 Failure Scenarios with Two Arbitrators
* Cluster can be manually started with the remaining node. Use the this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for a three-data-center configuration. Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the Managing MC/ServiceGuard manual. If you have already completed an MC/ServiceGuard cluster configuration worksheet, you only need to complete the first part of this worksheet. Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the Managing MC/ServiceGuard manual. If you have already completed an MC/ServiceGuard package configuration worksheet, you only need to complete the first part of this worksheet. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||