Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Understanding and Designing Serviceguard Disaster Tolerant Architectures Fourth Edition: > Chapter 2 Building an Extended Distance Cluster Using Serviceguard

Two Data Center Architecture

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The two data center architecture is based on a standard Serviceguard configuration with half of the nodes in one data center, and the other half in another data center. Nodes can be located in separate data centers in the same building, or even separate buildings within the limits of FibreChannel technology. Configurations with two data centers have the following requirements:

  • There must be an equal number of nodes (1 or 2) in each data center.

  • In order to maintain cluster quorum after the loss of an entire data center, you must configure dual cluster lock disks (one in each data center). Since cluster lock disks are only supported for up to 4 nodes, the cluster can contain only 2 or 4 nodes. The Serviceguard Quorum Server cannot be used in place of dual cluster disks, as the Quorum Server must reside in a third data center. Therefore, a three data center cluster is a preferable solution, if dual cluster lock disks cannot be used, or if the cluster must have more than 4 nodes. When using dual cluster lock disks, there exists a chance of Split Brain Syndrome (where the nodes in each data center form two separate clusters, each with exactly one half of the cluster nodes) if all communication between the two data centers is lost and all nodes remain running.
    The Serviceguard Quorum Server prevents the possibility of split brain, however the Quorum Server must reside in a third site. Therefore a three data center cluster is a preferable solution, to prevent split brain, and the only solution if dual cluster lock disks cannot be used, or if the cluster must have more than 4 nodes.

  • Two data center configurations are not supported if SONET is used for the cluster interconnects between the Primary data centers.

  • To protect against the possibility of a split cluster inherent when using dual cluster locks, at least two (three preferred) independent paths between the two data centers must be used for heartbeat and cluster lock I/O. Specifically, the path from the first data center to the cluster lock at the second data center must be different than the path from the second data center to the cluster lock at the first data center. Preferably, at least one of the paths for heartbeat traffic should be different from each of the paths for cluster lock I/O.

  • No routing is allowed for the networks between data centers.

  • MirrorDisk/UX mirroring for LVM and VxVM mirroring are supported for clusters of 2 or 4 nodes. However, the dual cluster lock devices can only be configured in LVM Volume Groups.

  • There can be separate networking and FibreChannel links between the two data centers, or both networking and Fibre Channel can go over DWDM links between the two data centers. See the section below “Network and Data Replication Links Between the Data Centers” for more details.

  • CVM 3.5 and CVM 4.1 mirroring is supported for Serviceguard and Extended Cluster for RAC clusters. However, the dual cluster lock devices must still be configured in LVM Volume Groups. Since cluster lock disks are only supported for up to 4 nodes, the cluster can contain only 2 or 4 nodes.

  • MirrorDisk/UX mirroring for Shared LVM volume groups is supported for Extended Cluster for RAC clusters containing 2 nodes.

  • FibreChannel Direct Fabric Attach (DFA) is recommended over FibreChannel Arbitrated loop configurations, due to the superior performance of DFA, especially as the distance increases. Therefore Fibre Channel switches are preferred over Fibre Channel hubs.

  • Any combination of the following FibreChannel capable disk arrays may be used: HP StorageWorks Virtual Arrays, HP StorageWorks Disk Array XP, Enterprise Virtual Arrays (EVA) or EMC Symmetrix Disk Arrays. Refer to the HP Configuration Guide (available through your HP representative) for a list of supported FibreChannel hardware.

  • Application data must be mirrored between the primary data centers. If MirrorDisk/UX is used, Mirror Write Cache (MWC) must be the Consistency Recovery policy defined for all mirrored logical volumes. This will allow for resynchronization of stale extents after a node crash, rather than requiring a full resynchronization. For SLVM (concurrently activated) volume groups, Mirror Write Cache must not be defined as the Consistency Recovery policy for mirrored logical volumes (that is, NOMWC must be used). This means that a full resynchronization may be required for shared volume group mirrors after a node crash, which can have a significant impact on recovery time. To ensure that the mirror copies reside in different data centers, it is recommended to configure physical volume groups for the disk devices in each data center, and to use Group Allocation Policy for all mirrored logical volumes.

  • Due to the maximum of 3 images (1 original image plus two mirror copies) allowed in MirrorDisk/UX, if JBODs are used for application data, only one data center can contain JBODs while the other data center must contain disk arrays with hardware mirroring. Note that having three mirror copies will affect performance on disk writes. VxVM and CVM 3.5 mirroring does not have a limit on the number of mirror copies, so it is possible to have JBODS in both data centers, however increasing the number of mirror copies may adversely affect performance on disk writes.

  • Veritas Volume Manager (VxVM) from mirroring is supported for distances of up to 100 kilometers for clusters of 16 nodes. However, VxVM supports up to 10 kilometers for clusters of 16 nodes on supported versions of HP-UX. Ensure that the mirror copies reside in different data centers and the DRL (Dirty Region Logging) feature is used. Raid 5 mirrors are not supported. It is important to note that the data replication links between the data centers VxVM can only perform a full resynchronization (that is, it cannot perform an incremental synchronization) when recovering from the failure of a mirror copy or loss of connectivity to a data center. This can have a significant impact on performance and availability of the cluster if the disk groups are large.

  • Veritas CVM version 3.5 mirroring is supported for Serviceguard, Serviceguard OPS Edition, or Serviceguard Extension for RAC clusters (SGeRAC) for distances up to 10 kilometers for 2, 4, 6, or 8 node clusters, and up to 100 kilometers for 2 node clusters.

    Since CVM 3.5 does not support multiple heartbeats and allows only one heartbeat network to be defined for the cluster, you must make the heartbeat network highly available, using a standby LAN to provide redundancy for the heartbeat network. The heartbeat subnet should be a dedicated network, to ensure that other network traffic will not saturate the heartbeat network. The CVM Mirror Detachment Policy must be set to “Global”. CVM 4.1 supports multiple heartbeat subnets.

  • For clusters using Veritas CVM 3.5, only a single heartbeat subnet is supported, so it is required to have both Primary and Standby LANs configured for the heartbeat subnet on all nodes. For SGeRAC clusters, it is recommended to have an additional network for Oracle RAC cache fusion traffic. It is acceptable to use a single Standby network to provide backup for both the heartbeat network and the RAC cache fusion network, however it can only provide failover capability for one of these networks at a time.

  • Serviceguard Extension for Faster Failover (SGeFF) is not supported in a two data center architecture, which requires a two-node cluster and the use of a quorum server. For more detailed information on SGeFF, refer to the Serviceguard Extension for Faster Failover Release Notes and the “ Optimizing Failover Time in a Serviceguard Environment” white paper.

NOTE: Refer to Table 1-2 “Supported Distances Extended Distance Cluster Configurations” for the maximum supported distances between data centers for Extended Distance Cluster configurations.

For more detailed configuration information on Extended Distance Cluster, refer to the HP Configuration Guide (available through your HP representative).

For the most up-to-date support and compatibility information see the SGeRAC for SLVM, CVM & CFS Matrix and Serviceguard Compatibility and Feature Matrix on http://docs.hp.com -> High Availability -> Serviceguard Extension for Real Application Cluster (ServiceGuard OPS Edition) -> Support Matrixes

Two Data Center FibreChannel Implementations

FibreChannel Using Hubs

In a two data center configuration, shown in Figure 2-1 “Two Data Centers with FibreChannel Hubs”, it is required to use a cluster lock disk, which is only supported for up to 4 nodes. This configuration can be implemented using any HP-supported FibreChannel devices. Disks must be available from all nodes using redundant links. Not all links are shown in Figure 2-1 “Two Data Centers with FibreChannel Hubs”.

Figure 2-1 Two Data Centers with FibreChannel Hubs

Two Data Centers with FibreChannel Hubs

The two cluster lock disks should be located on separate FibreChannel loops to guard against single point of failure. The lock disks can also be used as data disks. They must be connected to all nodes using redundant links (not all links are shown in Figure 2-1 “Two Data Centers with FibreChannel Hubs”).

Nodes can connect to disks in the same data center using short wave ports, and hubs can connect between data centers using long-wave ports. This gives you a maximum distance of 10 kilometers between data centers, making it possible to locate data centers in different buildings.

FibreChannel Using Switches

The two data center architecture is also possible over longer distances using FibreChannel switches. Figure 2-2 “Two Data Centers with FibreChannel Switches and FDDI” is one example of a switched two data center configuration using FibreChannel and FDDI networking.

Figure 2-2 Two Data Centers with FibreChannel Switches and FDDI

Two Data Centers with FibreChannel Switches and FDDI

DWDM with Two Data Centers

Figure 2-3 “Two Data Centers with DWDM Network and Storage” is an example of a two data center configuration using DWDM for both storage and networking.

Figure 2-3 Two Data Centers with DWDM Network and Storage

Two Data Centers with DWDM Network and Storage

Cross-Subnet Configuration with Two Data Centers

Figure 2-4 “Two Data Centers with Cross-Subnet” is an example of a two data center configuration using DWDM for both storage and networking.

Figure 2-4 Two Data Centers with Cross-Subnet

Two Data Centers with Cross-Subnet

Cross-Subnet Configurations

As of Serviceguard A.11.18 it is possible to configure multiple subnets, joined by a router, both for the cluster heartbeat and for data, with some nodes using one subnet and some another.

A cross-subnet configuration allows:

  • Automatic package failover from a node on one subnet to a node on another

  • A cluster heartbeat that spans subnets.

NOTE: For detailed information on configuring cross-subnet see the Managing Serviceguard Fifteenth Edition user’s guide.

Restrictions

The following restrictions apply when configuring Cross-Subnet:

  • All nodes in the cluster must belong to the same network domain (that is, the domain portion of the fully-qualified domain name must be the same).

  • The nodes must be fully connected at the IP level.

  • A minimum of two heartbeat paths must be configured for each cluster node.

  • There must be less than 200 milliseconds of latency in the heartbeat network.

  • Each heartbeat subnet on each node must be physically routed separately to the heartbeat subnet on another node; that is, each heartbeat path must be physically separate:

    • The heartbeats must be statically routed; static route entries must be configured on each node to route the hearbeats through different paths.

    • Failure of a single router must not affect both hearbeats at the same time.

  • Because Veritas Cluster File System from Symantec (CFS) requires link-level traffic communication (LLT) among the nodes, Serviceguard cannot be configured in cross-subnet configurations with CFS alone.
    But CFS is supported in specific cross-subnet configurations with Serviceguard and HP add-on products such as Serviceguard Extension for Oracle RAC (SGeRAC); see the documentation listed below.

  • Each package subnet must be configured with a standby interface on the local bridged net. The standby interface can be shared between subnets.

  • Deploying applications in this environment requires careful consideration; see “Implications for Application Deployment” on page 188 in the Managing Serviceguard Fifteenth Edition user’s guide.

  • cmrunnode will fail if the “hostname LAN” is down on the node in question. (“Hostname LAN” refers to the public LAN on which the IP address that the node’s hostname resolves to is configured).

  • If a monitored_subnet is configured for PARTIAL monitored_subnet_access in a package’s configuration file, it must be configured on at least one of the nodes on the node_name list for that package. Conversely, if all of the subnets that are being monitored for this package are configured for PARTIAL access, each node on the node_name list must have at least one of these subnets configured.

    • A package will not start on a node unless the monitored subnets configured on that node, and specified in the package configuration file as monitored subnets, are up.

Further Reading

For more information on the details of configuring the cluster and packages in a cross-subnet context, refer to the following:

  • Managing Serviceguard Fifteenth Edition user’s guide and see “Obtaining Cross-Subnet Information” on page 229.

  • “Configuring a Package to Fail Over across Subnets: Example” on page 188.

  • (for legacy packages only) “Configuring Cross-Subnet Failover” on page 384.

IMPORTANT: Although this topology can be implemented on a single site, it is most commonly used by extended-distance clusters, and specifically site-aware disaster-tolerant clusters, which require HP add-on software.

Design and configuration of such clusters are covered in the disaster-tolerant documentation delivered with Serviceguard. For more information, see the following documents at http://www.docs.hp.com-> High Availability:

  • Understanding and Designing Serviceguard Disaster Tolerant Architectures

  • Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters

  • Using Serviceguard Extension for RAC

  • The white paper Configuration and Administration of Oracle 10g RAC Database in HP Metrocluster

Advantages and Disadvantages of a Two Data Center Architecture

The advantages of a two data center architecture are:

  • Lower cost.

  • Only two data centers are needed, meaning less space and less coordination between operations staff.

  • No arbitrator nodes are needed.

  • All systems are connected to both copies of data, so that if a primary disk fails but the primary system stays up, there is a greater availability because there is no package failover.

The disadvantages of a two data center architecture are:

  • There is a slight chance of split brain syndrome. Since there are two cluster lock disks, a split brain syndrome would occur if the following happened simultaneously:

    The chances are slight, however these events happening at the same time would result in split brain syndrome and probable data inconsistency. Planning different physical routes for both network and data connections or adequately protecting the physical routes greatly reduces the possibility of split brain syndrome.

  • Software mirroring increases CPU overhead.

  • The cluster must be either two or four nodes with cluster lock disks. Larger clusters are not supported due to cluster lock requirements.

  • Although it is a low cost solution, it does require some additional cost:

    • FibreChannel links are required for both local and remote connectivity.

    • All systems must be connected to multiple copies of the data and to both cluster lock disks.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.