Paper Discussion - Fall 2008 - CS 736: A Fast File System for UNIX

The Problem

A Fast File System for UNIX describes the efforts of researchers at Berkeley in 1984 to both improve the general performance UNIX filesystem and extend its feature-set in useful ways. Most notably, new flexible allocation policies significantly improve HDD transfer speeds while conserving unused HDD space.

Contributions

Variable Block Size
The original UNIX file system forced users to read and write 512 B blocks to HDD (the typical HDD sector size). The Fast File System grants users the ability pick their own block size (with a 4096 B minimum) to match the characteristics of both their data and hardware. On average, 512 B blocks were probably already too small a granularity to capture many file IOs at the time of writing.

Cylinder Groups
The old UNIX filesystem free list has been deprecated in the Fast File System. Instead consecutive cylinders on HDD are grouped, using a localized bitmaps to track which blocks are free in each group. For the sake of fault-tolerance, this bookkeeping information is spread out over the physical disk: consecutive groups store their bookkeeping at different offsets within themselves.

Block Fragments
Adopting larger blocksizes can result in increased storage overhead as many files will use up a portion of a block. In reality, the cylinder group bitmaps above track free space at a granularity smaller than the block size: the fragment size. Each block may contain up to 8 fragments. This reduces the amount of dead space between files.

File System Layout Improvements
Minimizing seek latency and storing blocks of the same file together generally improve filesystem performance. New blocks within the same file are typically allocated on the same cylinder, if the free space is available. Other locality optimizations include storing all the inodes for a particular directory together in the same cylinder (they are frequently requested together).

If a write overflows a cylinder, a cylinder with a significant amount of free space is picked to finish the write. Similarly, methods exist to carefully select where to place a new file or directory on disk.

New Semantics
The Fast File System supports shared and exclusive locking at the file granularity. It also added support for symbolic links and quotas, features taken for granted today. The authors also discuss a safer way to rename a file stored in the their filesystem.

Flaws
One might ask: if we are locking at the file granularity, why not build support for this into the kernel itself, applying it all file systems?

The performance data cited focused on speed of sequential reads and writes to presumably large files. What about seeks through a large file? The cost of opening files? Concurrent reads and writes? I feel like a lot went undocumented here.

Key Performance Tradeoff
Additional accounting was required to keep track of filesystem fragments, etc. It looks like this accounting is resulting in a significant CPU hit (see Table II).

Posted by: James Jolly | November 4, 2008 08:18 AM

In this paper the authors present FFS, a reimplementation of the Unix file system, that provides higher throughput. They describe the problems of the old file system and then present the changes they’ve made to address them. Their modifications include bigger block sizes that can be divided into fragments, the use of cylinder groups and file system parameterization.

The old file system didn’t seem to provide high throughput. The combination of the small block size, limited write-ahead in the system and many seeks severely limited the system performance. The authors tried to address these problems by changing the underlying implementation. As a result, the users of the system have not been faced with software conversion.

One of the main contributions of the system is that it can efficiently manage large and small files. More specifically, by increasing the block size (4K minimum), it reduces the number of disk transfers needed to read a large file. To manage small files, it introduces the notion of fragments. Each block can be partitioned into fragments, so multiple small files can reside in one block. Moreover, FFS avoids the problems created by the free list and gives greater flexibility (it is possible to have file systems with different block sizes in the same system). Another contribution of the system is the division of disk partitions into cylinder groups which are used to minimize seek costs by increasing the locality of reference. Each cylinder group contains a replication of the superblock, so that the problems caused by a possible failure are minimized. Another interesting idea is the parameterization of the file system. FFS tries to exploit the characteristics of the system (processor speed, properties of the disk) in order to achieve greater performance. Finally, another contribution of the file system is the use of advisory locking (shared and exclusive locks).

One flaw of the system, is the overhead created by the use of fragments. Data may be copied many times when expanding a file and many checks need to be done in order to find if there is enough space or we need to store data in another fragment/block. Moreover, an inode is allocated for each 2048 bytes but the authors don’t explain why they have chosen this number. Although the authors introduce a new file locking mechanism, they don’t present a deadlock detection mechanism. As a result, deadlocks may occur in the system. Moreover, although the idea of parameterization is very interesting, it must be adapted in order to be used with the new disk technologies (e.g flash disks).

The main techniques used to improve performance is the use of bigger block size, to minimize disk accesses when accessing large files and the use of fragments to reduce the space needed when managing small files. Another technique is the use of system properties in order to allocate blocks in an optimum way. The tradeoff here is time vs space. More space needs to be allocated, because the block size is increased. The use of fragments to improve the utilization of space, involves a significant overhead. Finally more space is needed for bookkeeping (superblock is replicated).

Posted by: Avrilia Floratou | November 4, 2008 08:18 AM

The "Fast File System for Unix" includes a set of optimizations trading processing power for better performance off the disk. One of them in particular, the use of a larger block size together with "fragments," is particularly effective in increasing bandwidth utilization for larger file reads and writes. Following this, a discussion on allocation policy and several other less-interesting functional enhancements (e.g. the addition of a "rename" call and file locking) leave us with a file system nearer to file systems of the current era than the previous.

I will claim wholeheartedly that, in the modern day, "space is free." That is, we can feel free to waste some disk space in favor of higher performance because disk space is less expensive than either additional processing power or additional effort by the programmer managing it. In 1984 when this paper was written, space was increasingly inexpensive but definitely not "free," and so an otherwise simple optimization (increase the block size) had to come with some provisions to reduce space waste. This provision, the mapping of unused "fragments" within a block, was where the authors made a different trade: at the expense of more CPU time, they created a file system with better space-use properties than would otherwise have been achieved with larger blocks. The increase of block size itself improves performance dramatically for larger files, as the disk can transfer four times as much data in a single read. The new file system also makes some attempt to string multiple disk blocks from the same file together on the disk as well as to compensate for rotational delay of access, improving locality of reference for sequential blocks in a file. At a higher level, the new file system also balances the writing of a file across several cylinders, preventing "crowding" on a single cylinder.

There are a couple less nice things about the authors' approach. To begin with, the authors are not particularly concerned about the high CPU utilization of their scheme, which surprises me (as speed was not free then either). They mention that the old file system is particularly bad for programs which do relatively little processing on relatively large data sets. Their measurements support the benefit offered to this class of programs, but with such a high CPU utilization, I wonder what the effect is on the opposing class of programs (relatively much computation producing smaller data sets). Specifically, as any time spent calculating cylinder offsets and the like is, "wasted time," for such a program, I wonder how much time actually is lost. In line with this excess CPU usage, their computation-heavy low-level optimizations appear to prevent them from implementing higher-level optimizations such as chaining kernel buffers or pre-allocating space for quickly-growing files. It seems well-understood by the authors (upon review of the "rename" enhancement) that communication has a significant cost, and their ability to mitigate this cost is limited severely by their over-use of the CPU for file access.

From a historical perspective, it may be that processor speed had begun to take off in dramatic fashion by 1984, thus engendering the author's willingness to use additional CPU cycles gratuitously. Such attitudes had certainly taken hold by the mid-90s when I began following computers; a documented classification of these sentiments would be valuable to a modern reader's understanding of this paper. On a technical note, the authors here made another trade, deciding to move disk access scheduling from the disk controller (specialized to this task, presumably fast) to the CPU (generalized, probably not as fast). Doing this caused a significant rise in CPU usage and a waste of a running processor on the disk, as disk accesses under the new system always arrive in scheduled order. I wonder if there were a better way to make use of the known properties of a disk controller to remove some stress from the CPU, allowing the controller to do its own optimizations on seek time. Lastly, the end-cost of the "fragment" idea is unclear from these experiments. If the management of fragments comes at low cost for 4096-byte blocks, it seems that one should be able to continue to increase the block size to even further benefit. At some point, the user would cross some threshold where the management of fragments would regress to the problem of managing blocks on a typical workload, but there may be speed to be gained in the meantime, especially for the class of programs which interests the authors.

Posted by: Tack | November 4, 2008 08:10 AM

A Fast File System for UNIX

Summary

The authors describe the techniques used in their reimplementation of the
UNIX filesystem to get better performance and other features. The authors also
produce evaluation results.

Description of the problem being solved

The authors are tyring to solve the problem of removing inefficiencies in
the traditional UNIX filesystem to make is provide higher throughput, adapt to
a wide range of devices, better locality of reference, other features like
advisory locks, long file names support, provision for resource usage control.

Contributions of the paper

1) More reliable than the tradition UNIX filesystem : They make the newer
filesystem by making it replicate the most critical information like the
superblock and also by distributing unrelated data over the disk's capacity.

2) More throughput and better performance : The new file system uses a bigger
block size to utilize the bandwidth better and at the same time fragments each
block into smaller chunks to avoid wastage of space for small files.
Empirically, the newer file system performs better overall.

3) Adapts to wide range of peripherals : The new file system uses
parameterization to understand properties of the system like the processor
speed, disk rotation speed etc to adapt its policies to take advantage for physical
parameters to enhance performance.

4) Layered policies: Since having the global policy layer know all information
is costly, at the global level only heuristics are used and the local policy
layer does the actual job of allocating inodes and data blocks.

Flaws in the paper

1) inode allocation with one inode per 2048 blocks : 2K seems to be a too
small a average file size to have that many inodes.

Techniques used to achieve performance

0) Optimizing for common case : non small file access pattern. This is why the
new file system performed better empirically.
1) Utilizing spatial and temporal locality of reference in data access.
2) Layering to separate responsibilities used in global and local policies.

Tradeoff made

Optimizing for larger files by sacrificing a littled overhead during small
files pattern : Since the concept of fragmentation and copy over during
increase in the file size in units of fragments is overhead for small file
access patterns and however, this works out great in the general case and for
large file access patterns empirically.

Another part of OS where this technique could be applied

1) Optimizing for common case could be applied in many places. For e.g. Caches
are hardware optimization for speeding up the common case.

2) Spatial and temporal locality of reference is also taken advantage in the
cache replacement policies to better utilize the cache.

3) Layering to manage complexity is also used in many places. One example is
the TCP/IP stack.

Posted by: Leo Prasath Arulraj | November 4, 2008 07:51 AM

Summary
The paper describes a reimplementation of the Unix file system that is primarily aimed at increasing the throughput of the old file system. By increasing block size, by allocating blocks that exploit locality of reference and by grouping data that will be sequentially accessed, the throughput is improved.

Problem attempted
The objective of the paper is to substantially increase the throughput of the Unix file system by avoiding inefficiencies that are inherent in the allocation policies of the old file system.

Contributions

1. The new file system implements super-block replication to enhance protection of critical data. The way replicated data is stored is such that if a single platter fails no harm is done to the super-block data.

2. Expanding the block-size of the file system has been one of the primary means to increase throughput. In order to avoid the space wastage that arises out of large block size, the system supports fragmentation where each block of size 4 KB has four fragments of size 1KB. It is upto the user program to write full blocks at a time to minimize fragment reallocation.

3. By parametrizing processor capabilities, storage characteristics of disk and other such properties of underlying hardware, the new filesystem tailors itself to the disk on which it is placed. This is a great feature as storage characteristics of devices can be vastly different and optimizing w.r.t a specific device will yield substantial performance benefits

4. The new filesystem tries to cluster the inodes of the files in a single directory by placing them in a single cylinder group. It attempts to place data blocks of a file in the same cylinder group. At the same time it handles the conflicting demand to avoid filling a cylinder group fully. So it chooses a threshold file size of 48 KB to be stored in the same cylinder group.

Flaws

1. The files that are created when a file system is full will lack locality in the disk. The paper claims that once enough space is available, the data can be moved to achieve locality. But this entails a lot of copying and if this operation is tolerable, then one could claim that periodical reorganization of the disk to restore locality in the old file system is fine too.

2. The estimate of the number of inodes to be allocated to each cylinder group seems pessimistic. Allocating one inode for each 2 KB of space on a cylinder group will consume a significant fraction of the disk space. But if disk space is not a constraint, this is not a serious problem.

Trade-offs

1. The paper makes a trade-off between the space wastage that occurs by using a larger block size and the efficiency that comes when block size is large. It increases the block size and patches the space wastage problem by implementing fragments.

2. Another trade-off is between allocating data blocks of a file in the same cylinder group and not filling up a cylinder group completely. It puts a threshold on the amount of data from a single file that can be allocated on a single cylinder group.

Techniques used

1. Paremetrization is a technique widely used in many areas in CS. For example, compilers are optimized for a specific architecture.

2. Exploiting locality of reference - The memory hierarchy is a classic example of exploiting locality of reference.

Posted by: Balasubramanian Sivan | November 4, 2008 04:01 AM

SUMMARY
This paper describes the Fast File System, a modification of a basic UFS of the early 1980s to with the primary goal of improving performance.

PROBLEMS
UFS at the time was plagued by a number of issues, most significant are poor bandwidth utilization and poor portability to different drive characteristics (spindle speed, track size, etc.).

CONTRIBUTIONS / TECHNIQUES
Addressed the primary issues related to throughput: block size and locality. Increased block size, and at the same time created a mechanism for splitting blocks into multiple files in able to address what would otherwise have been a new problem of small files wasting lots of space.

Attempts to increase locality by keeping inodes, data blocks and accounting data in the same cylinder.

Allocates space for directory entries in 512 byte chunks, and supports names up to 256 characters long.

Adds file locking to the file system, introduces symbolic links that can link files across physical disks and adds quotas.

FLAWS
FFS is designed specifically for rotational hard disk drives – something that has persisted in most file system designs until recently. The changes to FFS may not show benefits for (flash)ram disks, file systems on RAID arrays or other types of storage medium.
Further, the performance analysis of the paper is poorly done. While the paper was comparing FFS to the old UFS which used 1024 byte block sizes the authors never made a direct comparison using their changes at the 1024 byte block size; they compared instead at 4k and 8k.
File locking in the FS is nice, but they don't address deadlocks – not that I can say they could within the narrow confines of a FS API.

OTHER APPLICATIONS
Methods for increasing locality are important for increasing cache coherency in a number of applications in which one deals with large sets of dynamic data. Adaptive block sizes are used in networking, where data is sent in varying packet sizes.

Posted by: Dale Emmons | November 4, 2008 03:29 AM

A FAST FILE SYSTEM FOR UNIX

SUMMARY
This paper introduces a modification in the Unix File System. These modifications include bigger block sizes that can be divided into equal size fragments, the introduction of cylinder groups and new allocation policies. These modifications do not require changes in the user interface. Later some modifications are proposed that improve the file system user interface: long file names, file locking, symbolic links and quotas.

PROBLEM
This research tries to improve the throughput and reliability of the current Unix File System. These modifications need to be introduced without changing the user interface. A second problem is later studied, what are the most necessary changes for the file system user interface.

CONTRIBUTIONS
They increase the block size in order to improve throughput, but to avoid high percentages of wasted space in disk they propose the possibility of allocating fixed fragments of the block. Bigger block sizes increase the transferred data per disk transaction. Having a bigger block size makes it possible to have bigger files with only two levels of indirection.

The cylinder groups are a way of dividing the disk. For each group there is a redundant copy of the superblock and there is a bit map of the free blocks. Having redundant copies of the superblock increases reliability. The bit map replaces the old free blocks list.

The size of the block and the fragments within the block are parameters that can be chosen when the file system is created, giving extra flexibility to the system. Some other parameters related to the processor and disk characteristics are used to improve the allocation policies and thus the file system performance.

The allocation policies try to increase locality and reduce seek time. Files in the same directory have its inode placed in the same cylinder group. The data blocks for the same file are placed at rotational optimal positions.

FLAWS
In order for these layout policies to work properly the disk needs to have some free space, for the performance a 10% of free disk is chosen. Is this the optimal number, how does this varies with the size of the disk. Why is a 4096/1024 structure preferred over others like 8192/512 or 4096/512? There is no explanation for these choices. They propose using file locking to synchronize multiple accesses to the same file, but they do not explain how deadlock is avoided and how it is solved when it happens.

PERFORMANCE
The increase of performance is obtained by increasing the block size to 4096 bytes, which produces higher data transfer per disk access. Having cylinder groups with a free block map reduces seek times. Allocation policies that increase locality and reduce seek time also contribute to the bandwidth increase. A tradeoff is to leave some unused space in the disk in order to be able to use the previously mention allocation policies: space is traded for throughput. Exploiting locality and locking are a very extended concepts in OS.

Posted by: Paula Aguilera | November 4, 2008 02:22 AM

Summary:
This paper presents FFS (A Fast File System for UNIX), which improves upon the traditional UNIX file system by significant gain in file system throughput from 3-4% to 47% (almost 10x increase). The paper observes subtle yet shortcomings of traditional UFS like small block sizes, rigid allocation policies more or less independent of disk and processor characteristics. Further the paper proposes long-needed functional enhancements to the interface of UFS.

Problem:
The original Unix File System only utilized 3-4% of available disk bandwidth. This was mainly due to small block sizes, limited read-ahead, randomization of disk block placement and many disk seeks which limited the file system throughput. FFS presents a reimplementation, which provides substantially higher throughputs.

Contributions:
One of the major contributions of FFS is the flexible allocation policies used that allow better locality of reference and which can be adapted to a mass storage and processor characteristics. This is achieved by global and local allocation policies.
Secondly, clustering of related data (eg. inodes of files in a directory) helps to reduce the disk seeks needed for frequent access. Thereby improving the file system throughput.
Thirdly, the use of different block sizes and the use of fragments not allow faster access to larger files but also reduce the amount of waste for small files.
Last but not the least, FFS also proposes functional enhancements to the filesystem API, eg. file advisory locks, long file names and extension of name space across file systems.

Flaws:
I believe this paper was the first step which led to moulding the file system storage stack according to the characteristics of conventional hard disks (eg. long seek times, rotational latencies, etc). This hard-wiring has carried out for more than 2 decades. Only recently, people have started thinking of decoupling these assumptions from the design of a file-system due to the advent of novel disk and memory technologies (eg. flash disks). This shows how a virtue can turn into a vice in the long run.

Techniques Used:
Some of the major techniques used by FFS are exploiting locality of reference by clustering related data. Second important technique is a clean-slate architecture for filesystems which allows their parameterized configuration to match the characteristics of the underlying disk technologies and the processor (separation of policy and mechanism). Adaptive block sizes for different file sizes and advisory file locks are other tricks which add more meat to FFS.

Tradeoffs:
Batching of requests in data buffers and ordering w.r.t. minimizing disk seeks trades off with the throughput of writing large files with small files. Larger block sizes for large files and fragments for small files, show a tradeoff in space (wastage) and time.

Alternative Uses:
Clustering of related data is a well known technique used for caching (locality of reference). Furhther batching is also used for write back caches. Adaptive block sizes are also used for networking protocols for different buffer sizes with variable packet sizes (Maximum Transfer Unit for Ethernet).

Posted by: Mohit Saxena | November 4, 2008 02:05 AM

Summary:
Paper describes Fast File System, which is reimplementation/enhancement to basic Unix File System. FFS provide better throughput by using flexible allocation policy.

Problem:
Existing Unix File System had following problem:
1. Organization of Unix file system segregates I-node info from data. These segregation causes long seek time.
2. Small block size combined with lack of locality severely limits the throughput. It was using 4% of Disk bandwidth.
3. Existing old FS ignore the parameter of underline hardware. Due to this old FS is very rigid and not able to adapt to different disk characteristic like rotation speed and number of blocks per track.

Work Summary/Contribution:
1. To improve the throughput disk is divided into cylinder groups. FFS tried to allocate file I-node that belongs to same directory in same cylinder group. It tries to put data block of file is also allocated in same cylinder group.
2. Block size is also increased to 4096 (or it can be set to more than 4096 ) so now more data can be accessed in one disk access. To reduce wastage of space for small file fragments of data block is introduced.
3. Copy of super block is maintained in cylinder group. This information is kept in spiral kind of structure, which make sure that one platter or track or cylinder failure doesn’t cause loss of all redundant copy of super block.
4. Free list is replaced with bit map (bloom filter). This helps in determining free block quickly.
5. FFS implement useful enhancement like long file name, Rename system call, Quotas for user, Symbolic link and File Locking.
6. FFS use two level of layout policy. Global layout policy determines the optimal placement of new directory and files in cylinder group. Local layout policy of each cylinder group determines the optimal placement of data block.

Flaws:
1. Still only 50% disk bandwidth is utilized
2. Reading data from FS require data copy from disk buffer in Kernel address space to data buffer in user space. This copying operation consumes almost 40% of time of I/O operation.
3. File locking doesn’t detect dead lock and there exist no deadlock prevention as well.

Tradeoff:
1. Space to keep file system Indexing information is traded off with space to keep track of available free block. Due to big block size less space is required to keep indexing information. Use of bloom filter requires more space for free block.
2. Space to keep free block is traded off with time. Now it is fast to determine if a block is free but needs more space.

Another part of OS where technique can be applied: :
1. Bitmap based structure to keep track of available block can be used in virtual Memory system to determine which particular page is free.
2. Quota limit (soft limit and Hard limit) on file system can be applied on other resources like sockets, Processor time and memory usage.