The Design and Implementation of a Log-Structured File System
Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Trans. on Computer Systems 10(1), February 1992, pp. 26-52.
Reviews due Tuesday, 3/27.
Comments
Summary:
The paper presents a log structured file system (LFS) where all modifications to disk are written sequentially to a log-like structure. LFS is shown to perform an order of magnitude better than the UNIX Fast file system while writing. The basic design rationale made is that reads can be handled in main memory as the main memory sizes grow.
Problem:
Disk access times have increased slowly, while CPU speeds increase according to Moore's law. This make applications disk-bound, and the current Unix Fast file system uses a small amount of disk bandwidth by spending most of the time seeking. Main memory is becoming cheap and abundant to make file caches larger and hence reads can be done in main memory more often. FFS was designed for optimizing reads and hence a design change may be necessary.
Contributions:
LFS is a new idea, optimizing for writes on the assumption that more often reads can be done in main memory. Also, the reads are still comparable to FFS.
The write cache in main memory further optimizes writes by doing bulk writes and using maximum disk bandwidth.
The segment cleaning policies are designed well, by identifying the bimodal nature of data accession, and designing the cost-benefit technique based on age.
The identification that temporal locality is useful helps guide the age design while distributing blocks on segments during the cleanup process.
Roll forward is similar to fsck and a recovery mechanism is nice to have. Fsck would have to scan through a lot of meta-data structures while roll forward can work forward from the last checkpoint.
Flaws:
The reads being done in main memory is probably the common case, but it may turn out to be sub optimal for a certain class of applications which process huge datasets.
As disk capacity approaches, the file system becomes inefficient, looking to garbage collect as the design requires clean segments to write in.
Snapshotting/Versioning can be added without huge overhead.
Design choice for segment size, running the segment cleaner, and number of segments to clean - all very important for the system's performance are not taken care of properly.
Flushing to disk might be important for a certain class of applications taking away some of the benefit of this filesystem. (As in a multi-user environment, other reads might come in). In multi-user environment, there can be workloads created which make the performance as bad as FFS since the main memory per user may not be enough.
Performance:
The authors demonstrate an order of magnitude higher performance over FFS while writing, and still maintaining a comparable read rate. Crash recovery time using roll forward is reduced. The bulk writing is a performance optimization compromising reliability.
Posted by: Archit Gupta | March 27, 2007 11:01 AM
Summary
In this paper the authors present a log-structured file system which purportedly "uses disks an order of magnitude more efficiently than current file system." The primary assumption underlying the proposal is that files are cached in main memory and that increasing memory sizes will make caches even more effective, and that as a result "disk-traffic will become dominated by writes." The fundamental proposal is that all writes to disk are written sequentially in a log.
Contributions
* The observation that disk access will become write dominated due to increasing available memory.
* Mechanisms for maintaining fast read in spite of attempting to maintain sequential writes
* Segment cleaning mechanism and the detection of long-living and short-living segments.
* Other benefits of a log-structured file system are easy recovery from unexpected shutdowns and other errors through use of checkpoints.
Flaws
The necessity of the cleaning mechanism adds complexity and may make it impractical to allow other operating systems to efficiently "mount" a log-structured file system for write-access.
Performance
The authors demonstrated the superiority of their system for write-dominated workloads and for crash-recovery as well as close performance to the SunOS filesystem for reads.
Posted by: Aaron Bryden | March 27, 2007 08:49 AM
Summary:
The paper talks about 'Sprite LFS', a log-structured file system that was primarily designed for optimizing write performance.
Problem:
Most file systems at the time concentrated on storing the data (files) on disk such that retrieval is efficient(making use of logical locality, primarily). They were based on the assumption that disk access were more of reads, than writes. But in 1990's, memory sizes started becoming much larger and processors becoming much faster. This opened up the possibility of caching more and more data on memory. This made disk access more write-dominated, than read-dominated. Log-structured file system was designed in this scenario - to write data into disk quickly.
Contributions:
- Using memory as a cache for even while writing. The system cached data for multiple disk blocks and wrote them all with one single write, thus improving performance
- Spreading meta data all over the disk (at the end of current log), instead of certain locations on disk. This achieves better performance for scenarios where many files are created rapidly
- An active defragmentation process with a cost-benefit policy (the segments that are likely to stay unmodified are reclaimed/compacted first - though they are more utilized).
- Simple and efficient crash recovery, though chances of data loss is quite high.
Flaws:
It is not entirely clear from the paper on
- segment cleaning - how is this scheduled? This is an important component
of the system, and it would be interesting to see how it inter-operates
with other application disk access.
- what is the performance on file modifications, say at the middle of
a file? Since the inode and blocks for a given file are scattered all
over the disk, it could be significantly slower than FFS.
- how the read performance could be as fast as FFS - with data and
meta data spread all over the disk.
Relevance/Performance:
I think the file system does a pretty good job on what it wanted to do - i.e. improving write performance. But it almost ignored the read path (though claims have been made about 'comparable or faster' performance). I think a middle ground would have been better - use write caches, use logs for temporary data locations (like journals), but still do more efficient storage allocation.
Posted by: Base Paul | March 27, 2007 08:13 AM
Summary:
The authors describe the design of LFS, a file system that in some sense treats the entire hard drive as a journal, and especially how they tuned their cleaner.
Problem:
As RAM becomes more plentiful and able to be used for read caches, hard drive performance is expected to become more tied to the write performance. This paper was an attempt to design a system that is optimized for writes.
Contributions:
* A recognition of the above problem, and the overall idea of treating the whole disk as a log
* A determination of how to do cleaning so that the cost would be minimized
* LFS, from the standpoint of reading, looks almost like UFS
* LFS in some sense automatically co-locates files
* LFS is able to recover from crashes with very little work
* And, because I'm a fan of versioning file systems, it would be almost trivial to extend LFS as it's presented to do versioning
Flaws:
I'm a bit skeptical of their point about reads becoming less important actually. Sure, read caches are increasing, but so are hard drive sizes, and at least so are some files we work with. (Not all of course --- text files are still tiny.) What they are really relying on is that the amount of read cache to amount of hot data is increasing. They refer to another paper to justify their claim which may address this, I don't know, but I think they should have summarized it better. I'm sure there are also some cases (databases maybe, though they'd be likely to bypass the FS entirely) where the assumption doesn't hold really at all; in those cases, LFS provides worse guarantees on the performance than FFS would. In some sense, LFS is trying to optimize the best and common cases, but worst case performance could suffer; UFS would then be trying to "minimize the badness" by making all accesses somewhat reasonable.
Also, there is the very obvious potential problem that the file system now has an active component, the segment cleaner. Other file systems may have defragmenters
Performance
They were mostly concerned with using all the bandwidth that was available, even for small writes, so throughput and utilization were their concerns, but almost entirely for writes.
Posted by: Evan Driscoll | March 27, 2007 01:02 AM
Summary
In this paper, the authors present log-structured file system for efficient disk storage management. Log-structured file system writes all new information to the disk in a sequential structure. This dramatically increases the write performance and crash recovery.
Problem Description
Due to increase in main memory size, read requests to the disk can be satisfied quite effectively. As a result, writes bandwidth is becoming a bottleneck for most of the workloads. To improve the write bandwidth, the authors present a log-structured file system.
Summary of Contributions
The paper�s main contribution is to improve the performance of writes to a disk by buffering a sequence of file system changes in the file cache and then writing all the changes to the disk sequentially in a single write operation. The main advantage of this approach is that it amortizes the cost of write access time to the disk by combining multiple write requests into a single one. Thus, an application that frequently generates write requests can benefit from this approach a lot.
Log-structured file system can also reduce the crash recovery time. In traditional UNIX systems, crash recovery takes a lot of time because system has to scan all the metadata structures on the disk to determine where last changes were made. In log-structured system, it is easy to determine where the last changes were made because they are stored at the end of the log. Thus, it is possible to recover a system after a crash.
The concept of Roll-Forward is also presented in this paper. If checkpoints are used to recover data after a crash, then the changes made after the last checkpoint are lost. In order to recover as mush information as possible, Roll-Forwards mechanism is used. As a result, most of the useful work done after the checkpoint is recovered and the resources spent on that work are not wasted.
Flaws
The main assumption behind using write-buffering is that system crashes are quite infrequent. This assumption might not be valid incase of large servers and clusters where the probability of crashes in quite high. Thus, the advantage of using write-buffering to speed up write requests might not appear in these scenarios.
Segment cleaning is a very important component of LFS. It seems that segment cleaning might result in a significant increase in disk activity. The authors do not discuss anything related to this issue.
Performance
The papers main goal is to improve the performance of write request to the disk and to reduce crash recovery time. Write-buffering is used to buffer multiple write requests and then all the buffered requests are written through a single disk access. Crash recovery time is reduced because the changes made just before the crash are stored at the end of the log and the entire system is not required to be scanned to determine where the changes were made.
Posted by: Atif Hashmi | March 27, 2007 12:53 AM
Summary:
This paper proposes Log-Structured File System (LFS) which fully utilize the disk bandwidth by aggregating multiple small write calls and writing them continuously in a log style which does not go back even for update of files’ data. It increased the speed of write and system recovery.
Problem Addressed:
As the speed of CPU has been increasing rapidly, to fully utilize the CPU and improve the performance of applications, I/O with disk was the bottleneck. Since the RAM has been increasing the capacity, read calls were able to be access to cached files in RAM but the write calls requires an access to the disk and because current UNIX file system was calling a large number of small writes that requires seek, utilization of disk bandwidth was incredibly low.
Contributions:
Small synchronous random access to Large asynchronous sequential access.
By storing both meta-data and data itself into one format “log”, it reduced the number of random access to the disk which requires seek and also the nature of log allowed batched write so that small writes could be aggregated into one I/O. This improved the performance of file system dramatically since the disk write was the bottleneck.
Since LFS is writes all data sequentially and large empty space are needed to write a large data in a good condition, they used the unit called segment which is a group of blocks to make managing the empty space easier. Cleaning (Garbage Collection) policy and mechanism classified segments into hot and cold segments to treat it differently and improve the utilization and efficiency.
Fast recovery using checkpoint and roll forward which is a result of the nature log structured system reduced the number of data to be checked when rebooting an was another contribution of LFS.
Possible Improvements:
Since the cleaning is the key of performance of LFS, I felt it will be better if they had more simple and yet strong cleaning mechanism. Tracking the activities on disk in a little more fine grained unit might help cleaning in some perspective. So the way they stored meta data like summary block might also be considered to get more improvements. Also, since some people increments the disk size by adding extra disk, I thought it might be interesting to consider about LFS with multiple disks.
Performance:
It is clear that LFS has improved the utilization of bandwidth and dramatically changed the performance of write call which connects to the improvement for whole system performance. Also easiness of recovery should be important at the time of this paper. And some other potentials such as the usage of a little bit but not killed data to visualize the log for users and etc.
Posted by: Hidetoshi Tokuda | March 27, 2007 12:24 AM
Summary
CPU speed was increasing rapidly at the time this paper was written along with the amount of memory used to cache disk I/O. Disk access speed however was not changing very quickly and thus was increasingly becoming a bottleneck. This work presents a solution to the problems of the time and introduces a log-based file system that significantly increases write performance.
Problems Addressed
Current file systems of the time tended to store information on disk in a way that resulted in many small accesses to read or write files. File access was also done synchronously in that accesses would be waited upon to return before continuing. Due to these issues advances in disk technology were primarily in the areas of cost and capacity while little improvement was made in speed. Traditional file systems did not recover from crashes very easily and could take a while for a full disk scan to complete. The authors also addressed this issue in the log-based file system.
Contributions
To address the issues mentioned above a file system based on a log structure was presented. All data was stored within a log on disk in the order that it was written. This means data from the same file may not necessarily be all placed next to each-other physically on the disk and maintains more of a temporal locality compared to traditional file systems. I-nodes were placed within the log and an additional structure was added to the system to locate the i-nodes. Segments were used as the basic unit of allocation and made up the on disk log. Only empty segments could be written to and thus a cleaning policy was used to move data from several partially filled segments to a single empty segment. A summary block was allocated for each segment that detailed what data the segment contained and what files the data belonged too. Finally provisions were made to checkpoint the file system at regular intervals so that in the case of a crash the file system could be made consistent by simply reading the checkpoint. In the case of a failure during a checkpoint a second copy of the checkpoint was used and written to alternately so that there always existed a valid checkpoint.
Flaw
Overhead was evaluated primarily in terms of performance however it would be interesting to also see more analysis of the capacity overhead required to store the extra i-node maps and segment summary blocks.
Performance
File system responsiveness, especially for writes, was of primary concern for the authors of this work and was improved by using a log where very few seeks were required to write. Robustness was also a concern and addressed through the use of the checkpoint data structure.
Posted by: Nuri Eady | March 26, 2007 10:43 PM
Summary
The log file system is designed to make full use of the disk bandwidth through reduction of seek time by placing all writes in the same segment.
Problem
The UNIX file system only uses 5-10% of available disk bandwidth. The disk is becoming an increasing bottleneck in the system.
Contributions
The article began with a few observations about current disks and systems. Computer memory is becoming more prevalent. As more memory is available the majority of disk operations will be writes, because reads will be located in the cache. In the current system write latencies are dominated by seek times.
The Log-Structured file system removes seek time by placing all data sequentially on disk. Traditional directories/inode structure is incorporated to ensure fast read times. Inode maps and segment summaries allow relocation of data on the disk. In order to create large consecutive free areas needed by the Log-Structured file system a segment cleaning mechanism is introduced.
The cleaning mechanism activated when the free segments fall below a set threshold and continues executing until another threshold is crossed. The author experimentally determines two groups of segments: long living segments and short living segments. Long living segments don�t change but provide significant compaction benefits. A cost to benefit calculation is used to select which segments will be cleaned. Periodic cleaning adds an overhead to the file system.
Possible Improvements
I felt it would have been nice for the paper to explore some of the other triggers for cleaning. My intuition is that when the system approaches the cleaning threshold, there will be disc activity. Cleaning will induce even more disk activities into the queue, slowing the system response time. I would expect an idle system cleaning trigger to be much more latency friendly.
The write cost function does not seem to take into account updating inode pointers. Segments are small. A large file�s data pages will be spread over many and many small files can be in a segment. When relocation a segment, the inode writes and reads for many small files could become significant.
Boot time of this system is not considered. The initial population of the read cache must impose a significant slowdown on startup.
Performance
Disk throughput was the main concern of the article, which causes a decrease in latency in an I/O bound system. By increasing throughput disk efficiency is also increased.
Posted by: Kevin Springborn | March 26, 2007 09:07 PM
Paper Review: The Design and Implementation of a Log-Structured File System [Rosenblum & Ousterhout]
Summary:
In this paper, the authors descibe a log-structured file system that
performs file write operations as much as an order of magnitude faster
than the Fast File System. Their cost-benifit policy for ongoing
layout/maintenance allows the log-structured file system to out-perform
FFS for nearly all write/read patterns.
Problem:
This problem being addressed is that, as processor and memory resource
increases dramatically (improving read performance by caching),
syncronous writes to fixed disks are a bottleneck in I/O performance due
to access times limited by motion of disk arms.
Contributions:
Because the authors chose to store the data in just a log structure,
rather than writing to a small log/journalling area first then into s a
"traditional" data storage area, their contributions mainly involve how
to manage the space to preserve performance:
* Given that the performance of their log file system relies on the
availability of large empty spaces, The authors adopted an ongoing
"segment cleaning" method by determining "write cost", after finding
that it the choice of which segments to clean and how to order data
within those segments were the critical performance factors.
* The authors state that the key to high performance at a low cost is to
force the disk into a bimodal segment distribution where most segmnets
are nearly full and some empty or nearly empty so that the cleaner can
focus on cleaning the nearly empty segments to keep write cost low.
* One non-intuitive insight they found was that locality, or ostensibly
better grouping of data, could yield worse performance. To address
this, they sustain that hot and cold (volatile and quiescent,
respectively) segments of the file system must be treated differently by
the cleaning mechanism.
Flaws:
The authors claim that their file system is no more complex. Its
difficult to measure complexity though. Introducing an indepenendent
active process (the segment cleaner) that is tasked with maintaining the
structure for good performance introduces quite a large complexity,
given that how to schedule it best was not well understood at the time
of the writing. The performance could be less predictable than that of
a file system that only has data structures maintained during the read
and write operations.
Performance Impact and Relevance:
The performance impact of the log-based file system is primarily to
increase utilization of the disk I/O (bandwidth), and to improve
throughput, responsiveness and time to completion primarily for file
write operations.
Posted by: Dave Plonka | March 25, 2007 02:05 PM