| United States-English |
|
|
|
![]() |
Designing Disaster Tolerant High Availability Clusters: > Chapter 8 Cascading
Failover in a Continental ClusterData Replication Procedures |
|
This section describes the procedures that should be used to manage the data in a Symmetrix cascading failover configuration. The following procedures are needed if you already have existing data prior to implementing this solution. If you already have a metropolitan cluster (that is, the primary Symmetrix and the secondary Symmetrix are up and running, and the SRDF volume pairs between the two Symmetrix frames are already established) and you are now adding the recovery cluster to the configuration, only procedure 2 is required. This procedure is illustrated in Figure 8-7 “Mirroring from the Primary to the Secondary Symmetrix”. Execute the following commands from a node that connects to the primary Symmetrix:
Refer to the sample script datainit in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for examples of how these commands are used. This script is designed to run only on a node that connects to the primary Symmetrix. This procedure is illustrated in Figure 8-8 “Mirroring from the Secondary to the Recovery Symmetrix”. Execute the following commands from a node that connects to the primary Symmetrix:
Refer to the sample script prirefreshrec in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for examples of how these commands are used. This script is designed to be run only on a node that connects to the primary Symmetrix. Once the application starts writing data to the primary Symmetrix devices, the data on the recovery Symmetrix is out of sync with the primary data; the data is not current but consistent. The primary data needs to periodically synchronized to the recovery Symmetrix so its data is not too out of date. As long as the application continues writing new data to the primary Symmetrix, the data on the recovery Symmetrix will always be behind. The level of data currency on the recovery Symmetrix is dictated by the frequency at which it is refreshed. The refresh process is shown in Figure 8-9 “Data Refresh in Steady State”. The following procedure describes the steps necessary to periodically copy the data from the secondary Symmetrix to the recovery Symmetrix while the application is running on the primary site.
Refer to the sample script prirefreshrec in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for examples of how these commands are used. This script is designed to run only on a node that connects to the primary Symmetrix. This section describes the data replication procedures for various failover and failback scenarios. When a failure occurs at the primary site, the hosts are down or the whole site is down, the application package is automatically failover to the secondary site within the primary cluster. Until the problems at the primary site are fixed, and data replication is reestablished, there is no data protection for the package at the secondary site. Depending on the type of failure and how quickly the primary site is back online, data refresh to the recovery site is still needed. This scenario is illustrated in Figure 8-10 “Failure of Primary Site in Primary Cluster”. After failover, the application is running on secondary site and writing I/O to R2 devices. The data is not remotely protected. The procedure to refresh the data from the secondary Symmetrix to the recovery Symmetrix is the same as the one that is done in steady state. But the procedure is now running on a system in the secondary site; therefore, the options on some of the SYMCLI commands are different.
Refer to the sample script secrefreshrec in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for example of how these commands are used. This script is designed to run only on a node that connects to the secondary Symmetrix. Once the problems at the primary site have been fixed, the application can fail back to the primary site. The current RDF pair states of the package device groups will be “Split,” which is not handled automatically by the package control script. The following steps are required to move the package back to the primary site.
When the secondary site fails, or all SRDF links between the primary Symmetrix and the secondary Symmetrix fail, unless domino mode is used, the application running on the primary site is not aware of this failure and continues to run on the primary site. This scenario is illustrated in Figure 8-11 “Failure of Secondary Site in Primary Cluster”. Without the secondary site, the current configuration doesn’t provide any means to replicate the new data from the primary Symmetrix directly to the recovery Symmetrix. If the secondary site is down for a long time, the data in the recovery Symmetrix is very out-of-date. If the primary site fails during this time, and the recovery takes over, the customer will have to operate on an old copy of the data. Therefore, it's important to fix and have the secondary site up and running as soon as possible. When the secondary site is fixed, the SRDF volume pair between the primary Symmetrix and the secondary Symmetrix will be in “Suspended” mode. If the BCV/R1 in the secondary Symmetrix contains a good copy of the data, to protect this data from corruption in case of rolling disaster, these devices must be split from the mirror group before re-establishing the SRDF volume pairs between the primary Symmetrix and the secondary Symmetrix. Use the following steps:
In this scenario, the assumption is that both primary site and secondary site fail at the same time or very close to each other. This scenario is illustrated in Figure 8-12 “Failure of Entire Primary Cluster”. After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster. Note that data corruption may occur in situation where a disaster occurs at the primary cluster while the data refresh from secondary Symmetrix to the recovery Symmetrix is in progress. The data in the R2 devices in the recovery Symmetrix is not usable. The data can be recovered by restoring an old copy of the data from the BCV devices in the recovery Symmetrix. Execute the following commands to restore the data:
The data in the recovery Symmetrix may not be current but should be consistent. There is no additional procedure needed. The package control script is programmed to handle this case. After the application is up and running, re-establish the BCV devices as mirrors of the standard devices for an additional copy of the data: # symmir -g <recsymdevgrpname> est The current configuration doesn’t support the application failback to the primary site in the primary cluster unless the secondary site in the primary cluster is up and running. The secondary site has to be repaired first. The application can temporarily fail back to the secondary site while the primary site is still down. Before the application can fail back to either the secondary site or the primary site, the current data need to be restored from the recovery Symmetrix to the secondary Symmetrix and the primary Symmetrix. This procedure is used in situation where the application fails back and runs on the secondary site while the primary site is still down.
Refer to the sample script recrestoresec in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for the automation of step 3 to step 10. This script is designed to run only on a node that connects to the secondary Symmetrix. This procedure is used in situation where both the secondary site and the primary site are fixed and up and running. The package application fails back directly from the Recover cluster to the primary site in the primary cluster.
Refer to the sample script recrestoresec in the /opt/cmcluster/toolkit/SGSRDF/cascade/Samples directory for the automation of steps 3 to step 10. This script is designed to be run only on a node that connects to the secondary Symmetrix. |
||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||