| United States-English |
|
|
|
![]() |
Designing Disaster Tolerant High Availability Clusters: > Chapter 3 Building
a Metropolitan Cluster Using MetroCluster/CAMaintaining a Cluster that uses MetroCluster/CA |
|
While the cluster is running, manual changes of state for devices on the XP Series disk array can cause the package to halt due to unexpected conditions or can cause the package to not start up after a failover. In general, it is recommended that no manual changes of state be performed while the package and the cluster are running.
While a copy is in progress between XP systems (that is, the volumes are in a COPY state), you can see the progress of the copy by viewing the % column in the output of the pairdisplay command: # pairdisplay -g pkgB -fc -CLI
This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state. If you are using asynchronous data replication, you can see
the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster
operation, shows the percentage of the side file that is full:
This output shows that 35% of the side file is full. When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state. There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the MetroCluster/CA:
Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators”. After certain failures, data is no longer remotely protected. In order to restore disaster tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored. Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection:
Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure:
Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. You should particularly review these files after system, data center and/or application failures. Full resynchronization must be manually initiated after repairing the following failures:
The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the CA link is fixed, you use the pairresync command in one of the following two ways depending on which site you are on:
These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. MetroCluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||