Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP A5856A RAID 4Si PCI 4-Channel Ultra2 SCSI Controller: Installation and Administration Guide > Chapter 1 Overview

HP RAID 4Si Concepts

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Understanding the basics of SCSI is the foundation on which the principles of RAID technology is built. RAID begins as a modification to the SCSI Host Controller which, in turn, paves the way for hard disk manipulation and the creation of the logical drive.

What is RAID?

In 1987, David Patterson, Garth Gibson, and Randy Katz at the University Of California Berkeley, published a paper entitled, "A Case for Redundant Arrays of Inexpensive Disks (RAID)." This paper described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which would give performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, these arrays of small drives would appear to the computer as a single logical storage unit or drive.

NOTE: As stated previously, RAID now stands for Redundant Array of Independent Disks, due to the fact that virtually all disks are now "inexpensive."

The small disk drives used with personal and micro computers are lower in performance and capacity when compared to the large disk drives used on mainframes and super computers. Small drives have lower storage density than the large drives, but the smaller disk drives are equal to or better than the large drives in four areas:

  • I/O per actuator (multiple I/O capability)

  • Cost per megabyte

  • Mean time between failures (MTBF)

  • SCSI controller per drive (better cost/performance ratio)

Placing these small inexpensive disk drives into an array provides for the following enhancements:

  • High transfer rates

  • Increased disk capacity

  • High I/O rates

One of the highlights of the paper pointed out that as the number of drives in an array (also referred to as a stripe set) increases, the overall mean time between failures (MTBF) of the array decreases. This is a powerful proposal, considering that up until then, if your hard drive crashed you were dependent on some form of backup to restore data, usually from a tape drive—and then there was the grim outlook of having a system down while repairs were being made.

RAID Methodology

The RAID paper proposed a conceptual method for handling problems associated with MTBF and data availability through five possible RAID configurations, which were defined as RAID levels 1 through 5. Of the original five levels, RAID 1, 3, and 5 have become the most widely used in the computing industry.

NOTE: The Berkeley paper defined only levels 1 through 5. RAID Level 0 was conceived later, but is not considered to be a "true" RAID, because it does not provide any fault tolerance.

RAID levels extend increasing forms of data redundancy, and each higher level seems to be an improvement over the last. However, these methodologies are dependent on the particular I/O requirements of an application and the cost of implementing it. So, there is no panacea of best or worse case scenarios in light of this aspect.

Overall, a RAID has three main attributes that are explored in some way by all RAID Levels. They are:

  • A set of hard disk drives that can be viewed by the user as one or more logical drives.

  • Data recovery or reconstruction of data in the event of a drive failure (redundancy).

  • Data is always distributed across these drives in some particular fashion.

Figure 1-2 RAID Array to Logical Drive

RAID Array to Logical Drive

In Figure 1-1, the system interprets the three-disk array of small drives and presents it to the user as one large drive.

Defining RAID Data Storage

Disk array technology is based on improving I/O performance and capability through the use of multiple disks. In the past, large computers used upscale controllers and multiple disks, but each of these disks was independent of each other. The theory was that if files could be "properly placed" on a disk, the overall system would respond with consistent I/O.

However, in these configurations the storage devices are not really ever exactly balanced. What happens is the storage areas will form hot spots that make the I/O requests back up and force queues to be generated, causing slowdown of data transfer.

With the proposition of RAID storage, the idea of disk striping was introduced.

Disk Striping

Fundamental to RAID is a technique known as striping. Striping is a method of chaining multiple drives into one logical storage unit, which is also referred to as a stripe set. Striping involves partitioning each drive's storage space into stripes, or data chunks, which can be as small as one sector (512 bytes) or as large as several megabytes. These stripes are then interleaved in a "round robin" fashion, so that the combined space is composed alternately of stripes from each drive.

In effect, the storage space of the drives can be thought of as a shuffled deck of cards. The result is an even distribution of hot spots across the set of drives that uses the full I/O capability and improves overall disk performance. The type of operating environment determines whether large or small stripes are used.

Disk Mirroring

Another feature of the RAID architecture is disk mirroring. This is a method in which each write command to a disk is written (mirrored) on another disk, thereby creating two disks with the same data. If a disk drive fails, its mirror drive continues the operation. Mirroring requires additional system software and additional processing power. Mirroring also increases data availability but doubles media requirements and costs.

Redundancy & Fault Tolerance

Hardware fails. An unfortunate fact of life, but fortunately, RAID can lessen this occurrence. A failure is repaired by the replacement of a physical component. If a disk subsystem is considered to be fault tolerant, it means that any one component in the subsystem can fail, and the subsystem will remain operational. In a much broader sense, this also applies to other components of a system such as power supplies, adapters, controllers, and cabling. The specification of a RAID (RAID 3 through RAID 5) provides for a failed disk, by reconstructing the data contained on it using an extra disk (a redundant or parity disk containing redundant information) to recover the original information. If the failed disk is the redundant disk, the data from all the good disks is used to reconstruct the data on the redundant disk while the subsystem continues normal operations.

Supported RAID Levels

Redundant arrays of disks were originally specified at five levels (plus 0) in the Berkeley paper. Today, the most commonly used levels are the following:

  • RAID 0

  • RAID 1

  • RAID 3

  • RAID 5

RAID 0 (Striping)

In RAID 0, data is divided into blocks and distributed sequentially among the disks. This level is also referred to as pure striping. At least one disk is required to create a RAID 0. This type of array is used when high data transfer rates are required, but fault tolerance is not needed.

For example, consider five physical drives configured as one RAID 0 logical drive. The data blocks are then written as shown in Table 1-1 “RAID 0 Striping” below.

Table 1-1 RAID 0 Striping

 Disk 1Disk 2Disk 3Disk 4Disk 5
Stripe 1Block 1Block 2Block 3Block 4Block 5
Stripe 2Block 6Block 7Block 8Block 9Block 10

 

RAID 0 allows data to be accessed on multiple disks simultaneously. Read/Write performance on a multi-disk RAID 0 system is significantly faster than on a single-drive system.

RAID 0 Advantages

The advantages of using RAID level 0 are as follows:

  • Provides maximum data capacity.

  • Costs are low because no disk space is used for redundancy.

  • Access time is fast for both read and write.

RAID 0 Disadvantages

The disadvantages of using RAID level 0 are as follows:

  • Provides no redundancy, so if a drive fails data must be restored from a backup.

  • Hot Spares cannot be used.

RAID 0 Summary

To summarize using RAID level 0: Choose RAID 0 if you do not want redundancy but you need fast performance and low cost.

Non-Spanned Arrays—RAID 1, 3, and 5

This section describes RAID levels 1, 3, and 5, which use non-spanned arrays.

RAID 1 (Mirroring)

RAID 1 is the first level that provides data redundancy. Data written to one disk is simultaneously written to another disk. If one disk fails, the other disk can be used to run the system and reconstruct the failed disk. Since the disk is mirrored, it does not matter if one of them fails because both disks contain the same data at all times. Either disk can act as the operational disk. This level provides 100% redundancy but is expensive because each drive in the system is duplicated. This type of array is used for read-intensive, high fault-tolerant configurations. Two disk drives are required.

For example, consider two physical disks configured as one logical drive. The data blocks are then written as shown in Table 1-2 “RAID 1 Striping” below.

Table 1-2 RAID 1 Striping

 Disk 1Disk 2
Stripe 1Block 1Block 1
Stripe 2Block 2Block 2
Stripe 3Block 3Block 3

 

With this setup, if either disk fails all of the data is available from the other disk.

RAID 1 Advantages

The advantages of using RAID level 1 are as follows:

  • No data loss or system interruption due to disk failure.

  • Read performance is fast, because data is available from either disk.

RAID 1 Disadvantages

The disadvantages of using RAID level 1 are as follows:

  • Costs are high, because 50% of the disk space is allocated for data protection.

  • Actual data capacity is only 50% of the physical capacity.

RAID 1 Summary

To summarize using RAID level 1: Choose RAID 1 if high availability and performance are important, but cost is not a major concern.

RAID 3 (Striping with Dedicated Parity Drive)

RAID 3 uses parity to generate redundancy data from two or more parent data sets. A set of disks is used to stripe data (RAID 0) and another disk is used to collect parity information from the striped set of disks. This disk is referred to as a dedicated parity disk. Parity data does not fully duplicate the parent disk sets, but if a single disk in the set fails, it can be rebuilt from the parity of the respective data on the remaining disks. RAID 3 configurations are usually reserved for non-interactive applications that process large files sequentially and require fault tolerance. At least three drives are required—two striped disks and one parity disk.

For example, consider a five-disk array, four data and one parity disk, as shown in Table 1-3 “RAID 3 Striping—Five-Disk Array” below.

Table 1-3 RAID 3 Striping—Five-Disk Array

 Disk 1Disk 2Disk 3Disk 4Disk 5
Stripe 1Block 1Block 2Block 3Block 4Parity 1-4
Stripe 2Block 5Block 6Block 7Block 8Parity 5-8
Stripe 3Block 9Block 10Block 11Block 12Parity 9-12

 

In RAID 3, data reads are faster than writes, because parity must be calculated for each write. It, therefore, performs better for long writes than for short ones. RAID 3 works well for long data transfers, such as CAD or graphic files and data logging.

RAID 3 Advantages

The advantages of using RAID level 3 are as follows:

  • No data loss or system interruption due to disk failure.

  • Only one disk is required to provide redundancy. In the above example, 80% of the total disk capacity is available for data.

  • Optimizes data flow for long data transfers, such as video or imaging applications.

RAID 3 Disadvantage

The disadvantage of using RAID level 3 is that performance is slower than RAID 0 or RAID 1.

RAID 3 Summary

To summarize using RAID level 3: Choose RAID 3 if cost, availability, and performance are equally important. RAID 3 performs best when long, serial transfers account for most of the reads and writes.

RAID 5 (Disk Striping with Distributed Parity)

RAID 5 is similar to a RAID 3 in that it also uses striped data and parity to generate redundancy. However, instead of dedicating a disk entirely for parity storage, the parity is rotated or distributed among the stripes of the disk array. This is an advantage in applications that require high read-request rates with low write-request rates, such as transaction processing, office automation, and online customer service, because parity generation can slow write operations down considerably. At least three disks are required to configure this type of RAID level.

For example, consider a five-disk array, as shown in Table 1-4 “RAID 5 Striping—Five-Disk Array” below.

Table 1-4 RAID 5 Striping—Five-Disk Array

 Disk 1Disk 2Disk 3Disk 4Disk 5
Stripe 1Block 1Block 2Block 3Block 4Parity 1-4
Stripe 2Block 5Block 6Block 7Parity 5-8Block 8
Stripe 3Block 9Block 10Parity 9-12Block 11Block 12

 

RAID 5 outperforms RAID 1 for read operations. The write performance, however, might be slower than RAID 1, especially if most writes are small and random. For example, to change Block 1 in the table above, the controller must first read Blocks 2, 3, and 4 before it can calculate Parity Block 1-4. Once it has calculated the new Parity Block 1-4 it must then write Block 1 and then Parity Block 1-4.

RAID 5 Advantages

The advantages of using RAID level 5 are as follows:

  • No data loss or system interruption due to disk failure. If one disk fails its data can be reconstructed.

  • Capacity equivalent to only one disk is reserved for storage of redundancy data.

  • Outperforms RAID 1 for read operations.

  • Good performance for high volume of small, random transfers.

RAID 5 Disadvantage

The disadvantage of using RAID level 5 is that write performance is slower than RAID 0 or 1.

RAID 5 Summary

To summarize using RAID level 5: Choose RAID 5 if cost, availability, and performance are equally important. RAID 5 performs best if you have I/O-intensive, high read/write ratio applications (for example, transaction processing).

Spanned Arrays—RAID 1+0, 3+0, and 5+0

With the HP RAID 4Si controller, array spanning allows combining two, three, or four arrays into a single storage space. A spanned array must have the same number of disk drives in each array—each array can have two disks, three disks, four disks, and so on.

RAID 1+0—Spanning with Mirrored Arrays

A RAID 1+0 (formerly called RAID 10) configuration uses two, three, or four pairs of mirrored disks, spanning two, three, or four arrays, respectively. (RAID 1+0 is a RAID 1 configuration with array spanning.) If your RAID 1+0 logical drive spans two arrays with two physical drives each, the data blocks are written as shown in Table 1-5 “RAID 1+0—Four-Disk Array” below.

Table 1-5 RAID 1+0—Four-Disk Array

 Array 1Array 2
 Disk 1Disk 2Disk 3Disk 4
Stripe 1Block 1Block 1Block 2Block 2
Stripe 2Block 3Block 3Block 4Block 4
Stripe 3Block 5Block 5Block 6Block 6

 

RAID 1+0 Advantages

The advantages of using RAID level 1+0 are as follows:

  • No data loss or system interruption due to disk failure. If one disk fails, its mirror image is available.

  • Read performance is fast, because data is available from either disk in each pair.

  • You can create large logical disks: you can span up to five arrays containing a maximum of 10 physical drives.

RAID 1+0 Disadvantage

The disadvantage of using RAID level 1+0 is that costs are high, because 50% of all disk space is allocated for redundancy.

RAID 1+0 Summary

To summarize using RAID level 1+0: It provides the best performance for applications where redundancy and large logical drive size are required, and cost is not a factor.

RAID 3+0—Spanning with Dedicated Parity Drives

In a RAID 3+0 (formerly called RAID 30) configuration, parity blocks provide redundancy to a logical drive that spans two, three, four, or five arrays. (RAID 3+0 is a RAID 3 configuration with array spanning.) If your RAID 3+0 logical drive has two arrays with four physical drives each, the data blocks are shown in Table 1-6 “RAID 3+0 with Two Four-Disk Arrays” below.

Table 1-6 RAID 3+0 with Two Four-Disk Arrays

 Array 1Array 2
 Disk 1Disk 2Disk 3Disk 4Disk 5Disk 6Disk 7Disk 8
Stripe 1Block 1Block 2Block 3Parity 1-3Block 4Block 5Block 6Parity 4-6
Stripe 2Block 7Block 8Block 9Parity 7-8Block 10Block 11Block 12Parity 10-12
Stripe 3Block 13Block 14Block 15Parity 13-15Block 16Block 17Block 18Parity 16-18

 

RAID 3+0 Advantages

The advantages of using RAID level 3+0 are as follows:

  • No data loss or system interruption due to disk failure. If one disk fails, operation can continue while the failed disk is being rebuilt.

  • Only one disk in each array is dedicated to providing redundancy.

  • Optimizes data flow for long, serial data transfers such as video or imaging applications.

  • Lets you create large logical drives. You can span up to five arrays containing a maximum of 40 physical drives.

RAID 3+0 Disadvantages

The disadvantages of using RAID level 3+0 are as follows:

  • Capacity expansion is an offline operation.

  • Performance is slower than RAID 0 or 1+0.

RAID 3+0 Summary

To summarize using RAID level 3+0:

  • Choose RAID 3+0 if you need large logical drive size, and cost, availability, and performance are equally important.

  • RAID 3+0 performs best when long, serial transfers account for most of the reads and writes.

RAID 5+0—Spanning with Distributed Parity

In a RAID 5+0 (formerly called RAID 50) configuration, parity blocks are distributed throughout the logical drive, spanning two, three, four, or five arrays. (RAID 5+0 is a RAID 5 configuration with array spanning.) If your RAID 5+0 logical drive has two arrays with four physical drives each, the data blocks are written as shown in Table 1-7 “RAID 5+0 with Two Four-Disk Arrays” below.

Table 1-7 RAID 5+0 with Two Four-Disk Arrays

 Array 1Array 2
 Disk 1Disk 2Disk 3Disk 4Disk 5Disk 6Disk 7Disk 8
Stripe 1Block 1Block 2Block 3Parity 1-3Block 4Block 5Block 6Parity 4-6
Stripe 2Block 7Block 8Parity 7-9Block 9Block 10Block 11Parity 10-12Block 12
Stripe 3Block 13Parity 13-15Block 14Block 15Block 16Parity 16-18Block 17Block 18

 

RAID 5+0 Advantages

The advantages of using RAID level 5+0 are as follows:

  • No data loss or system interruption due to disk failure. If one disk fails, system operation continues while the failed drive is being rebuilt.

  • Only the equivalent of one disk in each array is dedicated to providing redundancy.

  • Lets you create large logical drives, spanning up to five arrays containing a maximum of 40 physical drives.

  • Gives good performance for a high volume of small, random transfers.

RAID 5+0 Disadvantages

The disadvantages of using RAID level 5+0 are as follows:

  • Capacity expansion is an offline operation.

  • Performance is slower than RAID 0 or 1+0.

RAID 5+0 Summary

To summarize using RAID level 5+0:

  • Choose RAID 5+0 if you need a large logical drive size and cost, availability, and performance are equally important.

  • RAID 5+0 performs best for I/O-intensive, high read/write ratio applications such as transaction processing.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2002, - Hewlett-Packard Development Company, L.P.