"Redundant Array of Independent Devices"
by Mo Nourmohamadian
As its name declares, when the concept of RAID was conceived, it was intended for use
with, well, disks. RAID provided for making multiple smaller disks operate like a single
bigger, faster, and more reliable one.
That was in the 80s. Now, in the 90s, the RAID concept is being applied to
tape arrays as well. This has led to a rather obvious self-contradiction as far as the
name is concerned, since there are no disks at all, expensive or otherwise, in a
tape-based "RAID" system. The name "Tape RAID" has gained widespread
acceptance for this new kind of product, but that name has also contributed to some
misconceptions that are worth clearing up.
Most important, disk and tape have different characteristics, characteristics which
have led them to be used at different positions in the storage hierarchy. Disk provides
quick, random access to a limited amount of online data for multiple users simultaneously.
Tape provides very inexpensive offline storage for an unlimited amount of sequentially
recorded data which must be accessed by one user at a time.
Therefore, in spite of the name, tape RAID can never substitute for an array of small
disks nor for a single larger one. Instead it "substitutes" for a hypothetical
ultra fast tape with extremely large capacity.
Misconception number one about
implementing RAID with tape instead of disk is that the original concept is somehow
compromised. Not true. The RAID concept survives the transition; only the applications
change. In fact, there are half a dozen concepts in this thing we call RAID, and several
of them (but not all) can be applied effectively to tape.
With a single bus controller it becomes more difficult to
realize throughput rates which are much greater than that of a single drive. Trying
to optimize around, leads to a Catch-22. Increasing block sizes lessens parallelism.
Decreasing block size results in more bus overhead thus limiting performance.
Parallel connection to tape drives achieves maximum throughput. With the right kind
of controller the user could conceptually achieve a reasonable level of concurrency as
well, by providing multiple ranks of striped tape drives. Controller costs are higher, but
performance can be vastly superior. |
Figure 1 |
Why RAID In The First Place?
Using multiple disks as if they constituted a single, larger, virtual one yields
improvements both in performance and in fault tolerance. Generally speaking, performance
is improved by making multiple reads and writes in parallel, and fault tolerance is
improved by mirroring, error correction, or the use of parity.
Striping
RAID performance is achieved through "striping" which divides user data into
consecutive blocks residing on multiple devices. The size of this block is referred to as
a "striping unit".
Host system read and write request sizes of more than one block create parallelism,
allowing more than one device to contribute to the data transfer speed. The larger the
request size, the more parallelism is maintained for a longer period of time, resulting in
throughputs several times that of a single drive. Decreasing the "striping unit"
so more parallelism is achieved reduces concurrency (servicing multiple small requests
simultaneously).
Choosing The Optimum Stripe Unit
The higher the degree of parallelism, the greater the throughput; but the lower the
parallelism, the greater the concurrency. Disk RAID users are forced to compromise.
This actually causes no problems with tape. The concept of striping works just as well
with tape as with disk. Multiple tapes in an array can be read or written for a single
request simultaneously, and their data streams merged. Th result is a throughput level
which can be several times that of a single drive.
And concurrency, measured by the number of simultaneous I/Os per second, does not
apply. Tape is a single user medium. Concurrency doesnt count. The arrays can be set
up to maximize parallelism (and therefore throughput) by choosing the smallest striping
unit possible, the byte. Thats what youll see in all the newest, highest
performance tape RAID systems.
Fault-tolerance
In disk RAID, fault tolerance can be improved by mirroring (maintaining a duplicate),
parity or the use of error checking codes (hamming Codes). Hamming Codes are rarely, if
ever, implemented, but the other two methods are very popular.
Mirroring
Mirroring, by definition, provides 100% redundancy. It also takes up twice as many disk
spindles. Mirroring in tape RAID is not only the fault tolerance. Tapes are removable
media. Users may elect to make two or more backup copies of the same files, one to be kept
nearby, another to be shipped to a disaster recovery center, and perhaps more. Tape media
is inexpensive. It takes no longer to record up to five multiple copies than one, why not?
Parity Methods
Two RAID levels (Levels 3 and 4) call for dedicating a separate drive for parity
information, stored as individual bytes in the case of Level 3 and as blocks in Level 4.
One RAID level (Level 5) calls for interspersing blocks of data and parity information
across all drives in a set. Al three incorporate striping for data and parity recording.
All three forms could conceivably be implemented with tape RAID, although not to equal
effect.
To begin, achieving higher throughput performance in tape RAID, as in disk RAID,
demands that you design in more parallelism and get more drives operating at one time.
That in turn demands a smaller "striping unit". Byte striping, with or without a
dedicated parity drive, simply works better than block striping with tape. Therefore Level
3 tape RAID systems will outperform Level 4 and Level 5.
In real life, the question is moot as no one has introduced a Level 4 tape RAID system.
Level 3 systems, on the other hand are becoming increasingly popular.
Misconception number two about RAID in
general and tape RAID in particular is that the dedicated parity drive represents a single
point of failure, and therefore extra vulnerability. However, adding a dedicated parity
drive makes it possible to recover from the failure of any single drive including the
parity drive.
A bad or missing tape can be reconstructed by coupling the data on the remaining good
tapes with that of the parity drive. The data from a missing or faulty tape can be
regenerated for the host on the fly if the tape controller is fast enough. (In some
systems it is even possible to turn off one of the tape drives while the array is
operating, without any degradation in performance.)
And if the parity drive is the one thats bad, who cares? The real data still
exists on the other tapes and doesnt have to be regenerated at all.
What about Level 5, interspersing parity blocks and data blocks across all the drives
in a set? Again, its one of these things that works better with disk than with tape.
And yet Level 5 tape RAID, block striping and all, is very common. Why? One reason is that
Level 5 systems can be less expensive than systems requiring parallel tape drive
connection. The hardware to support a Level 5 block striping system is also less expensive
in general than that for a Level 3 byte-striping system, and it was on the market first.
The difference is in the controllers. (
See Figure 1 ) SCSI
controllers which operate as extensions of the host SCSI bus dont require as much
circuitry as do controllers which provide a separate parallel bus for each drive. On the
other hand, the simpler controllers are limited to one-at-a-time reads and writes.
Although Level 5 tape arrays are theoretically not limited to single-bus controllers,
those currently on the market have been built that way. Level 3 controllers have all been
built with multiple busses.
Applications
Disk RAID systems are all used for high performance, secure, online storage. Tape RAID
systems have broader application.
Striping is used for fast disk backup and for high speed
data acquisition. Mirroring is used for
making duplicate backups (which is especially big in banking), and for simple copying
operations.
In addition to those modes, it is possible for tape arrays to be used in two others;
cascading and pass-through.
Cascading refers to the use of an array as a single tape drive with very large
capacity. The first tape is written until full, then the second, and so on. Applications
would be in unattended backup.
In the pass-through mode, each tape can operate independently. The array operates as
Just a Bunch Of Tapes (and of course we need an acronym for that, so youll hear
JBOT).
In both cascading and pass-through modes, little use is made of the tape array
controller, and the same effects can be achieved using multiple drives and the right
software (without the array controller).
However, many arrays can switch between multiple operating modes, providing their users
with additional benefits, including cascading and pass-through. Although this author is
not aware of Level 5 arrays which can operate in multiple modes, some combinations which
include one or more levels are common.
Levels 0, 1 and 3 are all concurrently implemented in systems from
Andataco, CoComp,
Dynatek Automation Systems, Excitron, LAND-5, Symbios/Metastor, Trimm U. K., and Virtual
Technology. Levels 0 and 1 and cascading are implemented together by Hi-Par Systems.
Cascading and Level 1 operation are implemented together in systems from Contemporary
Cybernetics and Transitional Technology. Level 5 hardware systems are available from Data
General and Peripheral Vision. Software implementations of Level 5 are available directly
from Netframe or from Cheyenne. There may be others, and, as always, the vendors
themselves are the best sources of information about their systems.
Pricing, Performance And Scalability
Current tape technology provides for uncompressed capacity of up to 20 GB per cartridge
(DLT 4000) and transfer rates in the range of 1.5 MB/s per drive. 50 GB tapes and 5 MB/s
transfers are just over the horizon. Multiplying those capacities and rates in a tape
array leads to some potentially mind boggling systems.
And theres more. Big misconception number three
is that tape arrays arent scalable. Of course they are. Users can acquire a
two-drive system for as little as $5738 (Andatacos 4mm-based
Encore Plus 4002S) and move up to at least seven drives with other systems already
on the market. In addition, arrays can be expanded with stackers and tape libraries.
With stackers or libraries, the volume of offline data can be enormous. And with
parallel bus controllers already achieving transfers in the range of 2X to 4X that of a
single drive, those large volumes can be brought into memory very rapidly.
Back to Articles of Interest