| United States-English |
|
|
|
|
The HP Storage Works Disk Array XP Series may be configured for use in data replication from one XP series unit to another. This type of physical data replication is a part of the Metrocluster/Continuous Access and Continentalclusters solutions. This section describes the hardware and software concepts necessary for understanding how to use Continuous Access software for physical data replication in disaster tolerant solutions. Continuous Access allows you to define primary and secondary volumes that are redundant copies of one another, as shown in Figure 3-1 “XP Series Primary and Secondary Volume Definitions”. Data replication proceeds from PVOL to SVOL. When failover is necessary, the SVOL can be changed into a PVOL for access by a package on the failover node. A device group is the set of XP devices that are used by a given package. The device group is the basis on which PVOLs and SVOLs are created. The fence level of the device group is set when you define it. All devices defined in a given device group must be configured with the same fence level. A fence level of DATA or NEVER results in synchronous data replication; a fence level of ASYNC is used to enable asynchronous data replication. Fence level = NEVER should only be used when the availability of the application is more important than the data currency on the remote XP disk array. In the case when all Continuous Access links fail, the application will continue to modify the data on PVOL side, however the new data is not replicated to the SVOL side. The SVOL only contains a copy of the data up to the point of Continuous Access links failure. If an additional failure, such as a system failure before the Continuous Access link is fixed, causes the application to fail over to the SVOL side, the application will have to deal with non-current data. If Fence level = NEVER is used, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure. See an example of rolling disaster in the following section “Fence Level of DATA”. Fence level = DATA is recommended to ensure a current and consistent copy of the data on all sides. If Fence level = DATA is not enabled, the data may be inconsistent in the case of a rolling disaster—additional failures taking place before the system has completely recovered from a previous failure.Fence level = DATA is recommended, in case of Continuous Access link failure, to ensure there is no possibility of inconsistent data at the SVOL side. Since only dedicated Continuous Access links are supported, the probability of intermittent link failure and inconsistent data at the remote (SVOL) side is extremely low. Additionally, if the following sequence of events occur, it will cause inconsistent and therefore unusable data:
Although the risk of this sequence of events taking place is extremely low, if your business cannot afford a minimal level of risk, then enable Fence level = DATA to ensure that the data at the SVOL side are always consistent. The disadvantage of enabling Fence level = DATA is when the Continuous Access link fails, or if the entire remote (SVOL) data center fails, all I/Os will be refused (to those devices) until the Continuous Access link is restored, or manual intervention is used to split the PVOL side from the SVOL side.
Applications may fail or may continuously retry the I/Os (depending on the application) if Fence level = DATA is enabled and the Continuous Access link fails. Fence level = ASYNC is recommended to improve performance in data replication between the primary and the remote site.
The XP disk array supports asynchronous mode with guaranteed ordering. When the host does a write I/O to the XP disk array, as soon as the data is written to cache, the array sends a reply to the host. A copy of the data with a sequence number is saved in an internal buffer, known as the side file, for later transmission to the remote XP disk array. When synchronous replication is used, the primary system cannot complete a transaction until a message is received acknowledging that data has been written to the remote site. With asynchronous replication, the transaction is completed once the data is written to the side file on the primary system, which allows I/O activity to continue even if the Continuous Access link is temporarily unavailable. The side file is 30% to 70% of cache (default 50%) that is assigned through the XP system’s Service Processor (SVP). The high water mark (HWM) is 30% of the cache as shown in Figure 3-2 “XP Series Disk Array Side File”. However, if the quantity of data in the side file exceeds 30% then the write I/O to the side file will be delayed. The delay can be from .5 seconds to a maximum of 4 seconds, in 500 ms increments, with every 5% increase over the HWM. If the HWM continues to grow, it will eventually hit the side file threshold of 30% to 70% cache. When this limit has been reached, the XP on the primary site cannot write to the XP on the secondary site until there is enough room in the side file. The primary XP will wait until there is enough room in the side file before continuing to write. Furthermore, the primary XP will keep trying until it reaches its side file timeout value, which is configured through the SVP. If the side file timeout has been reached, then the primary XP disk array will begin tracking data on its bitmap that will be copied over to the secondary volume during resync. Figure 3-2 “XP Series Disk Array Side File” depicts the side file operation.
In asynchronous mode, when there is an Continuous Access link failure, both the PVOL and SVOL sides change to a PSUE state. When the SVOL side detects missing data blocks from the PVOL side, it will wait for those data blocks from the PVOL side until it has reached the configured Continuous Access link timeout value (set in the SVP). Once this timeout value has been reached, then the SVOL side will change to a PSUE state. The default Continuous Access link timeout value is 5 minutes (300 seconds). An important property of asynchronous mode volumes is the consistency group (CT group). A CT group is a grouping of LUNs that need to be treated the same from the perspective of data consistency (I/O ordering). A CT group is equal to a device group in the Raid Manager configuration file. A consistency group ID (CTGID) is assigned automatically during pair creation.
The following are restrictions for an asynchronous CT group in a Raid Manager configuration file:
The following are some additional considerations when using asynchronous mode:
Continuous Access XP Journal is an asynchronous data replication between two HP XP12000 storage disk arrays. As depicted in Figure 3-3 “Journal Based Replication”, Continuous Access Journal uses two main features, “disk-based journaling” and “pull-style replication”. These two features reduce XP12000 internal cache memory consumption, while maintaining performance and operational resilience. Continuous Access Journal performs remote copy operations for data volume pairs. Each Continuous Access Journal pair consists of primary data volumes (PVOL) and secondary data volumes (SVOL) which are located in different storage arrays. The Continuous Access Journal PVOL contains the original data, and the SVOL contains the duplicate data. During normal data replication operations, the PVOL remains available to all hosts at all times for read and write I/O operations. During normal data replication operations, the storage array rejects all host-requested write I/Os for the SVOL. The SVOL write enable option allows write access to a secondary data volume while the pair is split and uses the SVOL and PVOL track maps to resynchronize the pair. When Continuous Access Journal is used, updates to PVOL can be stored in other volumes, which are called journal volumes. The update data that will be stored in journal volumes are called journal data. Figure 3-3 “Journal Based Replication” depicts Continuous Access Journal data replication for disk-based journaling in which the data volumes at the primary data center are being replicated to a secondary storage array at the remote data center. When collecting the data to be replicated, the primary XP12000 array writes the designated records to a special set of journal volumes. The remote storage array then reads the records from the journal volumes, pulling them across the communication link as described in the next section “Pull-Based Replication”. By writing the records to journal disks instead of keeping them in cache, Continuous Access Journal overcomes the limitations of earlier asynchronous replication methods. Writes to the journal are cached for application, but are quickly de-staged to disk to minimize cache usage. The journal volumes are architected and optimized for keeping large amounts of host-write data in sequence. In addition to the records being replicated, the journal contains metadata for each record to ensure the integrity and consistency of the replication process. Each transmitted record set includes both time stamp and sequence number information, which enables the replication process to verify that all the records are received at the remote site, and to arrange them in the correct write order for storage. These processes build on the proven algorithms of XP Continuous Access Asynchronous Data Replication. The journaling and replication processes also support consistency across multiple volumes. In addition to disk-based journaling, Continuous Access Journal uses pull-style replication. The primary storage system does not dedicate resources to pushing data across the replication link. Rather, a replication process on the remote system pulls the data from the primary system's journal volume, across the Continuous Access link, and writes it to the journal volume at the receiving site. The replication process then applies the journaled writes to the remote data volumes, using metadata and consistency algorithms to ensure data integrity. In the default configuration, Continuous Access Journal considers replication complete when the data is received in mirrored system cache at the remote system, written to the journal disk, and applied to the remote data volumes.Since the process that controls asynchronous replication is located on the remote system, this approach shifts most of the replication workload to the remote site, reducing resource consumption on the primary storage system. In effect, Continuous Access Journal restores primary site storage to its intended role as a transaction processing resource, not a replication engine. The pull-style replication engine also contributes to resource optimization. It controls the replication process from the secondary system and frees up valuable production resources on the primary system. In Continuous Access Asynchronous replication, typical issues include temporary communication problems, such as Continuous Access link failure or insufficient bandwidth for peak-load requirements. These conditions can cause cache-based “push” replication methods to fail. When this happens, traditional replication solutions suspend the replication process and go into bitmap mode, noting changed tracks in a bitmap for future resynchronization. Recovery typically involves a destructive process such as rewriting all the changed tracks, with possible loss of data consistency for ordered writes. In contrast, Continuous Access Journal logs every change to the journal disk at the primary site, including the metadata needed to apply the changes consistently. Should the replication link between sites fail, Continuous Access Journal keeps logging changes in the local journal so that they can be transmitted later, without interruption to the protection process or the application. The journal data is simply transferred after the network link failure or bandwidth limitation is corrected, with no loss of consistency. The recovery time may be extended a bit during temporary link failures or congestion, but the asynchronous replication process does not fail, and the catch-up process is simple and automatic. Data consistency is preserved. With Continuous Access Journal, the remote storage system pulls data from the primary journal volumes over the data replication network as fast as the bandwidth allows while adjusting to available network conditions. If available bandwidth does not support optimal replication, such as during peak-load spikes in transaction volume, the primary journal volumes buffer the data on disk until more bandwidth becomes available. The Continuous Access Journal has the asynchronous data replication characteristic. In XP12000, the fence level of the Continuous Access journal is defined to “async”, the same as the Continuous Access Asynchronous fence level. The journal group is a component of the Continuous Access Journal operations that consists of two or more data and journal volumes. The data update sequence from the host is managed per the journal group. This ensures the data update sequence consistency between the paired journal groups is maintained. Journal groups are managed according to the journal group number. The paired journal numbers of journal groups can be different. One journal group can have more than one data volume and journal volume belong to it. When a primary array performs an update (host-requested write I/O) on PVOL, the primary array creates the journal data (metadata and new write data) to be transferred to secondary array. The journal data is stored in the journal cache or journal volumes depending on an amount of data in cache. If available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volumes. A secondary array receives the journal data that is transferred from the primary array according to the read journal command. The received journal data is stored in the journal cache or the journal volumes depending on the “Use of Cache” parameter and/or amount of data in cache. If the “Use of Cache” is set to “Use”, journal data will be stored into the journal cache. If it is set to “No Use”, journal data will bypass the cache and move directly to the journal volumes. In addition, if available cache memory for Continuous Access Journal is low, the journal data is stored in the journal volume. For Continuous Access Journal processing, Continuous Access Journal allows the usage rate of journal volume to be specified. The Journal volume stores journal data to be transferred to the secondary array asynchronously using host write I/Os to PVOL. However, if the hosts transfer excessive amounts of data, the journal volume may become full. Consequently, if the journal volumes remains full for the specified period of time, the journal group will be suspended due to a failure. To specify the period of time for how long the journal volume can remain full, use the Data Overflow Watch option. The XP12000 array uses the following parameters to control the inflow of data into journal group and state change of the journal group:
If the amount of data in the journal volume, in the primary array, reaches the capacity, the disk array I/Os will be delayed. If journal volume remains full for the period of time specified by the Data Overflow Watch parameter, the primary array suspends the affected journal groups due to a failure. If the amount of data in the journal cache, in the secondary subsystem, reaches the specified journal cache capacity, the secondary subsystem stores the received journal data into the restore journal volume, and then issues the next read-journal command to the primary subsystem. This suppresses the cache usage rate increase. To accommodate, the Continuous Access Journal retains the PAIR state when the Continuous Access links fail while the Continuous Access Asynchronous switches to PSUE state as long as the journal volumes has enough space. In addition, this allows host write-data to be kept continuously as journal data in the journal volumes while the updated data is not being replicating to the remote array. Once the links are recovered, the data replication of the primary and secondary arrays is resumed automatically. The journal data accumulated in the primary journal volumes is replicated to the secondary site automatically.
The following two sections describe the “One-to-One Volume Copy Operations” and “One-to-One Journal Group Operations” limitations of the XP12000 Continuous Access Journal. Continuous Access Journal requires a one-to-one relationship between the logical volumes of the volume pairs. A volume can only be assigned to one journal group pair at a time.
The Continuous Access Journal supported configuration for a journal group pair is a one-to-one relationship. This means one journal group in one XP12000 can only pair with one journal group in another XP12000. The journal groups require that each data volume pair be assigned to one and only one journal group. One journal group can contain multiple journal volumes. Each of the journal volumes can have different volume sizes and different RAID configurations. Journal data will be stored sequentially and separately into each journal volume in the same journal group, and each of the journal volumes that are used equally. Journal volumes in the same journal group can be of different capacity. A journal volume in primary subsystem and the corresponding restore journal volume can be of different capacity. Unlike the Continuous Access Asynchronous device group that only contains data volumes, a journal group includes data volumes as well as journal volumes. Journal volumes must be registered in a journal group before creating a data volume pair for the first time in the journal group. Journal volumes are assigned to a specific journal group. Each journal group has it own ID. The journal volumes assigned to the specific journal group can be used to create one journal group pair. One journal group (JID) on primary array and one journal group (JID) on secondary array are used to create a journal group pair. Be sure to register journal volumes to journal groups on both primary and secondary arrays. The number and capacity of the journal volumes for a specific journal group on a primary or secondary array depends on the business need and IT infrastructure. To register journal volumes in a journal group use the “HP StorageWorks Command View XP”. For more information on this feature, refer to the HP-UX 11i Version 2 Release Notes.
Journal volumes can be registered in a journal group or can be deleted from a Journal group. Journal volumes cannot be registered or deleted when data copying is performed (that is, when one or more data volume pairs exist). The journal volumes can be deleted from a journal group in the following occasions:
If a path is defined from a host to a volume, do not register the volume as a journal volume and define paths from hosts to journal volumes. This means that hosts cannot read from and write to journal volumes. The remote copy connections are the physical paths used by the primary array to communicate with the secondary array. The primary XP12000 array and secondary XP12000 array are connected using fiber-channel interface (Note: ESCON is not supported with the XP12000). Ensure the connection is established in a bidirectional manner. Metrocluster Continuous Access XP supports only one journal group pair per package. Thus, in a metropolitan cluster, the number of packages can be configured to use journal group is limited by either the maximum number of journal groups that are supported by the XP12000 in the configuration, or by the maximum number of packages in the cluster, which ever is smaller. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||