 |
 |  |
 |
 | NOTE: You may need an additional license to use this feature. |
 |
 |  |
 |
Here are the types of recovery typically needed for RAID-5
volumes:
These types of recovery are described in the sections that
follow. Parity resynchronization and stale subdisk recovery are
typically performed when:
the RAID-5 volume
is started
shortly after the system boots
by calling the vxrecover command
For more information on starting RAID-5 volumes, see “Starting RAID-5 Volumes”.
If hot-relocation is enabled at the time of a disk failure,
system administrator intervention is not required unless there is
no suitable disk space available for relocation. Hot-relocation
is triggered by the failure and the system administrator is notified
of the failure by electronic mail.
Hot-relocation automatically attempts to relocate the subdisks
of a failing RAID-5 plex. After any relocation takes place, the
hot-relocation daemon (vxrelocd) also initiate a parity resynchronization.
In the case of a failing RAID-5 log plex, relocation only
occurs if the log plex is mirrored; the vxrelocd daemon then initiates a mirror resynchronization
to recreate the RAID-5 log plex. If hot-relocation is disabled at
the time of a failure, the system administrator may need to initiate
a resynchronization or recovery.
Parity Recovery |
 |
In most cases, a RAID-5 array does not have stale parity.
Stale parity only occurs after all RAID-5 log plexes for the RAID-5
volume have failed, and then only if there is a system failure.
Even if a RAID-5 volume has stale parity, it is usually repaired
as part of the volume start process.
If a volume without valid RAID-5 logs is started and the process
is killed before the volume is resynchronized, the result is an
active volume with stale parity. For an example of the output of
the vxprint -ht command, see the following example for a stale
RAID-5 volume:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v r5vol RAID-5 ENABLED NEEDSYNC 20480 RAID - pl r5vol-01 r5vol ENABLED ACTIVE 2048 0 RAID 3/16 RW sd disk00-00 r5vol-01 disk00 0 10240 0/0 c1t4d1 ENA sd disk01-00 r5vol-01 disk01 0 10240 1/0 c1t2d1 ENA sd disk02-00 r5vol-01 disk02 0 10240 2/0 c1t3d1 ENA |
This output lists the volume state as NEEDSYNC, indicating
that the parity needs to be resynchronized. The state could also
have been SYNC, indicating that a synchronization was attempted
at start time and that a synchronization process should be doing
the synchronization. If no such process exists or if the volume
is in the NEEDSYNC state, a synchronization can be manually started
by using the resync keyword for the vxvol command. For example, to resynchronize the RAID-5 volume
in Figure 8-1 “Invalid RAID-5 Volume”, use the following command:
Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The resynchronization process
starts at the beginning of the RAID-5 volume and resynchronizes
a region equal to the number of sectors specified by the -o iosize option. If the -o iosize option is not specified, the default maximum I/O
size is used. The resync operation then moves onto the next region until
the entire length of the RAID-5 volume has been resynchronized.
For larger volumes, parity regeneration can take a long time.
It is possible that the system could be shut down or crash before
the operation is completed. In case of a system shutdown, the progress
of parity regeneration must be kept across reboots. Otherwise, the
process has to start all over again.
To avoid the restart process, parity regeneration is checkpointed.
This means that the offset up to which the parity has been regenerated
is saved in the configuration database. The -o checkpt=size option controls how often the checkpoint is saved.
If the option is not specified, the default checkpoint size is used.
Because saving the checkpoint offset requires a transaction,
making the checkpoint size too small can extend the time required
to regenerate parity. After a system reboot, a RAID-5 volume that
has a checkpoint offset smaller than the volume length starts a
parity resynchronization at the checkpoint offset.
Subdisk Recovery |
 |
Stale subdisk recovery is usually done at volume start time.
However, the process doing the recovery can crash, or the can volume
start with an option to prevent subdisk recovery. In addition, the
disk on which the subdisk resides can be replaced without recovery
operations being performed. In any case, a subdisk recovery can
be done by using the recover keyword of the vxvol command. For example, to recover the stale subdisk
in the RAID-5 volume shown in Figure 8-1 “Invalid RAID-5 Volume”,
use the following command:
# vxvol recover r5vol disk01-00 |
A RAID-5 volume that has multiple stale subdisks can be caught
up all at once. To catch multiple stale subdisks, use the vxvol recover command with only the volume name, as follows:
Recovering Logs After Failures |
 |
RAID-5 log plexes can become detached due to disk failures,
as shown in Figure 8-2 “Read-Modify-Write”. These RAID-5
logs can be reattached by using the att keyword for the vxplex command. To reattach the failed RAID-5 log plex,
use this command:
# vxplex att r5vol r5vol-l1 |