CS 736 - Spring 2007 - Paper Discussion: A Case for Redundant Arrays of Inexpensive Disks (RAID)

Summary:

The paper presents a case for redundant array of inexpensive disks (RAID) as a better option over a Single Large expensive disk (SLED) with improvements in performance, reliability, power consumption and scalability. Five disk organizations are presented which offer different performance benefits.

Problem Addressed:

With the exponential growth in the processor and memory speeds more and more tasks were starting to get I/O bounded. Even with a large main memory for caching, there were many types of workloads which required frequent disk access and even the SLED was not keeping up.

Contributions:

RAID is widely used in all high performance disk environments. It is now a given even in a medium price range computing system.
The mean time between failures was increased drastically by adding redundancy. The mean time between repairs is added as an important metric and this can be factored in while choosing the RAID type.
There were various RAID types to handle different performance and reliability requirements.
The paper presents practicality issues which were addressed by the industry and academia over time.

Flaws:

The paper presents its own practicality issues. It covers a lot of issues with RAID many of which have been addressed over time.
Overall, the paper was presenting a new idea and a very successful one.

Performance:

RAID has proven to be one of the most standard way to get higher better I/O speeds, reliability and flexibility. One can construct the RAID array according to performance requirements, cost, reliability, etc factored in.

Posted by: Archit Gupta | April 10, 2007 10:57 AM

Summary
This paper discusses the reasons for using arrays of inexpensive disk to help I/O keep pace with processor and memory speed, while maintaining acceptable reliability. The paper discusses different levels of RAID with different performance/reliability tradeoffs.

Contributions
* Recognition of the differences in technology trends in disks and in processors/memory and the problems that was going to create.
* Comparison of several different approaches and the performance + reliability tradeoffs. These are mirrored disks, Hamming codes for ECC, Single check disk per group, Independent reads/writes, and no single check disk.

Flaws
The failure analysis makes some assumptions that may be unfounded if disks were manufactured together or have common failures. It would be interesting to know how often this is the case and what the real failure rate for RAID setups is.

Performance and Relevance
The discussion of the performance was sound and is still relevant today. This technology is also still popular today.

Posted by: Anonymous | April 10, 2007 10:34 AM

SUMMARY
In "A case for redundant arrays of inexpensive disks" Patterson et al. discuss advantages and some issues with replacing "single large expensive disks" with array of inexpensive disks. Necessity of the RAID approach and several possible implementations are discussed.

PROBLEM
Disk performance grows much slower than that of CPUs and memory. Therefore disk IO increasingly becomes a bottleneck that threatens to undo performance gains of other system components. The authors explore a way of creating an IO system with better overall performance than its individual components (performance being being capacity, transfer rate, reliability etc.).

CONTRIBUTIONS
* Identifying balance of performance system components as necessary for good overall performance
* Avoiding technological barriers to increased performance by cleverly using available components
* Replacing complex systems (SLEDs) with multiple simpler systems (RAID)
* Analysis of various RAID levels.
* Identification of open issues that may affect practicality of RAID
* Identifying appropriate usage scenarios (e.g. RAID 0 best for databases)

FLAWS
* Effect of caching in general on performance is not really clear. The good news is that the presented analysis is for worst-case.
* It would be interesting to see effects of on-disk caches. Modest disk these days have about 8MB of cache. RAID 5 therefore will have at least 24MB of cache. It would be interesting whether that setup would outperform a SLED with 24MB of cache and comparable seek and latency performance.

PERFORMANCE
Reliability, access time, latency, seek time, request completion time, capacity, overhead, cost, efficiency, scaling up, scaling out, overall performance of a system

Posted by: Vladimir Brik | April 10, 2007 09:31 AM

Summary
This paper explains the origins of RAID. The reader is presented with an overview of the storage landscape of the time, and then we plunge right in to the details of what sorts of problems RAID is supposed to fix. After that, the authors give some details about the different RAID levels and finish up by providing some benchmarks that show RAID is addressing the stated problems.

Problem
The computer science crystal ball was showing that some trends in storage were beginning to solidify: you have big, expensive, (and supposedly) reliable disks (SLEDS), smaller and more importantly cheaper disks, and the disk I/O bottle neck was becoming a major performance hit.

Contributions
It's hard to not write "RAID" here. Anyway, the fundamental contribution is using small, cheap disks to not only increase the overall reliability of the system, but also to work around the then-approaching disk I/O crisis.

The concept of having entire disks devoted to parity checking, and then using RAID 5 to distribute the parity information. Something's just fundamentally cool about that idea to me. It's also obviously better than implementing a naive disk mirroring system, as you can begin to recover some of your disks for actual storage.

Redundancy is key! Merely throwing disks into a rack isn't going to help your overall reliability. However, (cheap!) redundancy can. The authors certainly have the metrics to back up their assertions, which is another nice thing about this paper: nearly every decision seems to be backed up by benchmarks.

Flaws
The tables in the paper were pretty bad.

Performance / Relevance
My context is probably lacking here, but this paper seems as though it is a nearly complete solution. The authors introduced reliability and performance metrics, provided multiple RAID levels to afford maximum flexibility to sysadmins, and just generally seem to be storage-gods.

I suppose I like complete packages.

Posted by: Jon Beavers | April 10, 2007 08:57 AM

Summary:
This paper describes five types of Redundant Arrays of Inexpensive Disks (RAID) which increases the reliability, throughput, and/or capacity with low cost by using an array structured standard disks instead of expensive large disk such as SLED.

Problem Addressed:
Disk was becoming a bottleneck of the performance because the performance and cost of CPU and RAM was getting better and better but Disk�s performance did not have any dramatic improvement. Single Large Expensive Disk (SLED) was a solution that increases the size and reliability but still the cost and performance was a critical problem.

Contributions:
By using a array of cheap disks, the cost of deploying RAID is very cheap comparing to the SLED because users could just add few more disk to the existing disks and also could choose a desired level of reliability or performance by adjusting the number of disks too.
The variety of types in RAID provides user a flexibility to find the solution that matches best to the requirements.
The way RAID do a layer of indirection using RAID controller between disks and the system enables user to switch there type of RAID easily with out replacing all of the disks.
The basic idea of structuring an array of disk is very expandable and open for other algorithms and solutions that might appear in the future.

Possible Improvements:
As a proposal of RAID, I think this paper was pretty straight forward and has enough power to explain the benefit of RAID. But it might be much better if they had some real environment experiment and results instead of throwing many open questions and theories. Anyway I think it will be a good paper to start a discussion about the idea. Also as mentioned in the paper, RAID looks like it has an infinite possibility of chaining disks but actually it has many physical limitations such as cabling or power consumptions and disk access management. Therefore challenging to the limit of performance of RAID might be also interesting.

Performance:
RAID provides choices to users so that they could choose the best solution that matches to their needs and each level of RAID has a larger benefit than using SLED in few perspectives such as cost, performance, and/or reliability. The disk performance improvement that could be used with low cost also encouraged end users to use RAID too.

Posted by: Hidetoshi Tokuda | April 10, 2007 08:56 AM

Summary:
The paper discusses redundant arrays of inxpensive disks (RAID), which is about achieving performance, capacity and reliability of single, large, expensive disks (SLED), without costing anywhere as much.

Problem:
Over the years, the CPU and RAM performance has been increasing much faster than disk performance. Though SLEDs improved the capacity and reliability of storage to a great extend, it had limitations on performance. Small hard disks had comparable performance characteristics (within a factor of 2), though size was a problem. RAID was proposed to combine such small disks to create a big storage, while improving performance taking advantage of parallelism in I/O to multiple disks.

Contributions:
- Creating huge storage using inexpensive disks, without requiring special hardware (most techniques used can be implemented on sofware)
- Different levels of tuning performance metrics - I/O bandwidth, cost, reliability, or a combination of all. The choice is left to the user, depending on the applications he runs.
- Use of hot-swappable redundant disks to transparently recover from disk failures (and improve reliability)

Flaws:
- Effect of data caching in memory is not explored much (like there may not be a need to read a sector again to check parity - it may already be in memory)
- The tables in the paper are hardly readable. I wonder why measurements were not done for simple metrics such as data read/write speeds (in Mb/s).
- The experiments performed in the paper seemed quite skewed (like selection of data block sizes for some RAID levels)

Relevance:
I think the idea was revolutionary at a time 100MB hard disk space cost $1100. AID provided the ability to have large storage space while not compromising on (in fact, improving) performance or reliability.

Posted by: Base Paul | April 10, 2007 08:05 AM

Summary
The chapter compares five Redundant Arrays of Inexpensive Disks (RAID) setups on reliability, cost, and performance

Problem
Processors are getting faster and memory is getting larger each year, but the best disks' access times are only improving at a sluggish rate. While more performant, arrays of inexpensive disks are not reliable enough.

Contributions
The chapter begins by motivating the need for fast reliable disk access by citing a number of growth rate models for processors, memory and disks. It is clear an I/O bound application will see little speedup from faster processors and larger memory. Another analysis shows that arrays of commodity disks help the performance problem, but are highly susceptible to failure.

Five RAID approaches are compared on the amount of disk overhead and efficiency for large and small reads. Simple mirroring gives good performance, but results in a very large amount of overhead. Hamming codes reduce the overhead, but performance for small reads and writes becomes especially poor because the whole disk group has to be accessed. Check disks (which hold the parity bits) give less overhead and comparable performance to Hamming codes. Check disks operating at the sector level give even better performance at the same overhead as a bit-level check disk. Finally by spreading the check data over all disks very good performance is achieved at low overhead.

Possible Improvements
The idea is well reasoned and justified, but I am not certain the need for the abstraction was completely clear. By abstracting away the details of disk placement higher levels no longer have control over how disks are organized. The RAID controller lives at the low level of disk access. It cannot have a global view of disk usage in the system. The chapter uses the example of a database application. A database will have good knowledge of disk access, it would be much more able to optimally lay out data on the disk. A system such as a database could incorporate the concepts of RAID (checksums and multiple disks), but could also specially place the data to facility later use.
Google's "Failure Trends in a Large Disk Drive Population" article raises some questions about the correlation between disk failures. Non-iid disk failures would greatly reduce the reliability of large disk arrays.

Performance
The main goal was to increase the ability of commodity disks to scale up to large numbers. In achieving this goal reliability and throughput were also increased.

Posted by: Kevin Springborn | April 9, 2007 07:37 PM

Summary
In this work arrays of inexpensive disks were proposed and evaluated to address the performance gap that was widening between CPU speed and disk access speed.

Problems Addressed
At the time of writing CPU speed had been increasing at an exponential rate while memory capacity had also been increasing at a similar rate. However while hard disk drive capacity had been increasing and price dropping performance had not been increasing at rate to match the CPU and memory. Hard drive I/O quickly then became the system bottleneck and created a "crises" according to the authors that hindered further system advancement. As in other work reliability was considered a primary concern and any proposals had to provide equal or better reliability then the current solutions.

Contributions
Arrays of inexpensive disks were proposed as an alternative to large expensive disks for enterprise applications however due to the cheaper disks and the larger number of possible failures reliability was seen to deteriorate to a dismal level. To address this five different levels of redundancy were proposed each with advantages and disadvantages. Mirrored disks or RAID level one was proved most useful when the workload required less then fifty percent of the total disk capacity and fast reads were required. Striped data with extra check disks termed RAID level five was proved most useful when more disk capacity then fifty percent was required and became the most popular level used in most installations.

Flaw
Since the ideas presented in this work were quite new several issues were not fully explored however many of these open issues were also named in the paper. I thought the authors did quite a good job evaluating their design within the constraints they had to work within.

Performance
The ability to scale a storage system to larger capacities while maintaining reliability was the primary focus of this work. This was accomplished by designing a mechanism for attaching many inexpensive disks together with included redundant check information to maintain system reliability to an extent often greater then that offered by a single large expensive disk.

Posted by: Nuri Eady | April 9, 2007 07:11 PM

Paper Review: A Case for Redundant Arrays of Inexpensive Disks (RAID) [Patterson, et. al.]

Summary:

This paper expounds upon the advantages of using cheap personal computer
disks in RAIDs and also discusses 5 array organization techniques (RAID
levels) with unique performance characteristics. There are tradeoffs
amongst these that depend on resources available and the desired storage
performance delivered to an application.

Problem:

The problem is that disk performance (both in capacity and access time)
is dramatically outpaced by both CPU and memory improvements over time,
thus it is or will be a bottleneck unless some innovation is employed.

Contributions:

RAIDs offer many things:

* RAID has a flexibility of configurations, enabling one to optimize for
cost, reliability, or performance.

* Since arrays of many disks will result in much more frequent failures
(of individual disks), RAID levels provide a number of options to
make a reliable storage system from these unreliable components.
MTTR (Mean Time To Repair) for the failed components determines the
new estimator of RAID system reliability.

* The RAID levels presented (1-5) run the gambit from low to high
utilization (for data) and from low to high performance (depending on
size of I/O operations) and thus provide at least one good option for
most applications.

Flaws:

* The analytical models in the paper assume that failures are
exponential and independent. The authors admit this, presumably for
the sake of simplicity and because of a lack of much experience (as
RAID was new), in spite of citing plausible counter examples.

* The more intensive workload considered, i.e. transaction processing
system, involves still relatively large sector reads/writes. To me,
these seem to large to model some high I/O applications that do very
small (block-size) read-update-write transactions.

Performance Impact and Relevance:

RAIDs offer a wide variety of performance characterstics that impact the
systems resulting reliabity, transaction latency, efficiency/cost, and
scaling up and/or out.

Posted by: Dave Plonka | April 8, 2007 08:58 PM

CS 736 - Spring 2007 - Paper Discussion

A Case for Redundant Arrays of Inexpensive Disks (RAID)

Comments

Post a comment