« Lightweight Remote Procedure Call | Main | The Design and Implementation of a Log-Structured File System »

A Fast File System for UNIX

Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. A Fast File System for UNIX. ACM Trans. on Computer Systems 2(3), August 1984, pp. 181-197.

Reviews due Tuesday, 11/4.

Comments

The Problem

A Fast File System for UNIX describes the efforts of researchers at Berkeley in 1984 to both improve the general performance UNIX filesystem and extend its feature-set in useful ways. Most notably, new flexible allocation policies significantly improve HDD transfer speeds while conserving unused HDD space.

Contributions

Variable Block Size
The original UNIX file system forced users to read and write 512 B blocks to HDD (the typical HDD sector size). The Fast File System grants users the ability pick their own block size (with a 4096 B minimum) to match the characteristics of both their data and hardware. On average, 512 B blocks were probably already too small a granularity to capture many file IOs at the time of writing.

Cylinder Groups
The old UNIX filesystem free list has been deprecated in the Fast File System. Instead consecutive cylinders on HDD are grouped, using a localized bitmaps to track which blocks are free in each group. For the sake of fault-tolerance, this bookkeeping information is spread out over the physical disk: consecutive groups store their bookkeeping at different offsets within themselves.

Block Fragments
Adopting larger blocksizes can result in increased storage overhead as many files will use up a portion of a block. In reality, the cylinder group bitmaps above track free space at a granularity smaller than the block size: the fragment size. Each block may contain up to 8 fragments. This reduces the amount of dead space between files.

File System Layout Improvements
Minimizing seek latency and storing blocks of the same file together generally improve filesystem performance. New blocks within the same file are typically allocated on the same cylinder, if the free space is available. Other locality optimizations include storing all the inodes for a particular directory together in the same cylinder (they are frequently requested together).

If a write overflows a cylinder, a cylinder with a significant amount of free space is picked to finish the write. Similarly, methods exist to carefully select where to place a new file or directory on disk.

New Semantics
The Fast File System supports shared and exclusive locking at the file granularity. It also added support for symbolic links and quotas, features taken for granted today. The authors also discuss a safer way to rename a file stored in the their filesystem.

Flaws
One might ask: if we are locking at the file granularity, why not build support for this into the kernel itself, applying it all file systems?

The performance data cited focused on speed of sequential reads and writes to presumably large files. What about seeks through a large file? The cost of opening files? Concurrent reads and writes? I feel like a lot went undocumented here.

Key Performance Tradeoff
Additional accounting was required to keep track of filesystem fragments, etc. It looks like this accounting is resulting in a significant CPU hit (see Table II).

In this paper the authors present FFS, a reimplementation of the Unix file system, that provides higher throughput. They describe the problems of the old file system and then present the changes they’ve made to address them. Their modifications include bigger block sizes that can be divided into fragments, the use of cylinder groups and file system parameterization.

The old file system didn’t seem to provide high throughput. The combination of the small block size, limited write-ahead in the system and many seeks severely limited the system performance. The authors tried to address these problems by changing the underlying implementation. As a result, the users of the system have not been faced with software conversion.

One of the main contributions of the system is that it can efficiently manage large and small files. More specifically, by increasing the block size (4K minimum), it reduces the number of disk transfers needed to read a large file. To manage small files, it introduces the notion of fragments. Each block can be partitioned into fragments, so multiple small files can reside in one block. Moreover, FFS avoids the problems created by the free list and gives greater flexibility (it is possible to have file systems with different block sizes in the same system). Another contribution of the system is the division of disk partitions into cylinder groups which are used to minimize seek costs by increasing the locality of reference. Each cylinder group contains a replication of the superblock, so that the problems caused by a possible failure are minimized. Another interesting idea is the parameterization of the file system. FFS tries to exploit the characteristics of the system (processor speed, properties of the disk) in order to achieve greater performance. Finally, another contribution of the file system is the use of advisory locking (shared and exclusive locks).

One flaw of the system, is the overhead created by the use of fragments. Data may be copied many times when expanding a file and many checks need to be done in order to find if there is enough space or we need to store data in another fragment/block. Moreover, an inode is allocated for each 2048 bytes but the authors don’t explain why they have chosen this number. Although the authors introduce a new file locking mechanism, they don’t present a deadlock detection mechanism. As a result, deadlocks may occur in the system. Moreover, although the idea of parameterization is very interesting, it must be adapted in order to be used with the new disk technologies (e.g flash disks).

The main techniques used to improve performance is the use of bigger block size, to minimize disk accesses when accessing large files and the use of fragments to reduce the space needed when managing small files. Another technique is the use of system properties in order to allocate blocks in an optimum way. The tradeoff here is time vs space. More space needs to be allocated, because the block size is increased. The use of fragments to improve the utilization of space, involves a significant overhead. Finally more space is needed for bookkeeping (superblock is replicated).

The "Fast File System for Unix" includes a set of optimizations trading processing power for better performance off the disk. One of them in particular, the use of a larger block size together with "fragments," is particularly effective in increasing bandwidth utilization for larger file reads and writes. Following this, a discussion on allocation policy and several other less-interesting functional enhancements (e.g. the addition of a "rename" call and file locking) leave us with a file system nearer to file systems of the current era than the previous.

I will claim wholeheartedly that, in the modern day, "space is free." That is, we can feel free to waste some disk space in favor of higher performance because disk space is less expensive than either additional processing power or additional effort by the programmer managing it. In 1984 when this paper was written, space was increasingly inexpensive but definitely not "free," and so an otherwise simple optimization (increase the block size) had to come with some provisions to reduce space waste. This provision, the mapping of unused "fragments" within a block, was where the authors made a different trade: at the expense of more CPU time, they created a file system with better space-use properties than would otherwise have been achieved with larger blocks. The increase of block size itself improves performance dramatically for larger files, as the disk can transfer four times as much data in a single read. The new file system also makes some attempt to string multiple disk blocks from the same file together on the disk as well as to compensate for rotational delay of access, improving locality of reference for sequential blocks in a file. At a higher level, the new file system also balances the writing of a file across several cylinders, preventing "crowding" on a single cylinder.

There are a couple less nice things about the authors' approach. To begin with, the authors are not particularly concerned about the high CPU utilization of their scheme, which surprises me (as speed was not free then either). They mention that the old file system is particularly bad for programs which do relatively little processing on relatively large data sets. Their measurements support the benefit offered to this class of programs, but with such a high CPU utilization, I wonder what the effect is on the opposing class of programs (relatively much computation producing smaller data sets). Specifically, as any time spent calculating cylinder offsets and the like is, "wasted time," for such a program, I wonder how much time actually is lost. In line with this excess CPU usage, their computation-heavy low-level optimizations appear to prevent them from implementing higher-level optimizations such as chaining kernel buffers or pre-allocating space for quickly-growing files. It seems well-understood by the authors (upon review of the "rename" enhancement) that communication has a significant cost, and their ability to mitigate this cost is limited severely by their over-use of the CPU for file access.

From a historical perspective, it may be that processor speed had begun to take off in dramatic fashion by 1984, thus engendering the author's willingness to use additional CPU cycles gratuitously. Such attitudes had certainly taken hold by the mid-90s when I began following computers; a documented classification of these sentiments would be valuable to a modern reader's understanding of this paper. On a technical note, the authors here made another trade, deciding to move disk access scheduling from the disk controller (specialized to this task, presumably fast) to the CPU (generalized, probably not as fast). Doing this caused a significant rise in CPU usage and a waste of a running processor on the disk, as disk accesses under the new system always arrive in scheduled order. I wonder if there were a better way to make use of the known properties of a disk controller to remove some stress from the CPU, allowing the controller to do its own optimizations on seek time. Lastly, the end-cost of the "fragment" idea is unclear from these experiments. If the management of fragments comes at low cost for 4096-byte blocks, it seems that one should be able to continue to increase the block size to even further benefit. At some point, the user would cross some threshold where the management of fragments would regress to the problem of managing blocks on a typical workload, but there may be speed to be gained in the meantime, especially for the class of programs which interests the authors.

A Fast File System for UNIX

Summary

The authors describe the techniques used in their reimplementation of the
UNIX filesystem to get better performance and other features. The authors also
produce evaluation results.

Description of the problem being solved

The authors are tyring to solve the problem of removing inefficiencies in
the traditional UNIX filesystem to make is provide higher throughput, adapt to
a wide range of devices, better locality of reference, other features like
advisory locks, long file names support, provision for resource usage control.

Contributions of the paper

1) More reliable than the tradition UNIX filesystem : They make the newer
filesystem by making it replicate the most critical information like the
superblock and also by distributing unrelated data over the disk's capacity.

2) More throughput and better performance : The new file system uses a bigger
block size to utilize the bandwidth better and at the same time fragments each
block into smaller chunks to avoid wastage of space for small files.
Empirically, the newer file system performs better overall.

3) Adapts to wide range of peripherals : The new file system uses
parameterization to understand properties of the system like the processor
speed, disk rotation speed etc to adapt its policies to take advantage for physical
parameters to enhance performance.

4) Layered policies: Since having the global policy layer know all information
is costly, at the global level only heuristics are used and the local policy
layer does the actual job of allocating inodes and data blocks.

Flaws in the paper

1) inode allocation with one inode per 2048 blocks : 2K seems to be a too
small a average file size to have that many inodes.

Techniques used to achieve performance

0) Optimizing for common case : non small file access pattern. This is why the
new file system performed better empirically.
1) Utilizing spatial and temporal locality of reference in data access.
2) Layering to separate responsibilities used in global and local policies.

Tradeoff made

Optimizing for larger files by sacrificing a littled overhead during small
files pattern : Since the concept of fragmentation and copy over during
increase in the file size in units of fragments is overhead for small file
access patterns and however, this works out great in the general case and for
large file access patterns empirically.

Another part of OS where this technique could be applied

1) Optimizing for common case could be applied in many places. For e.g. Caches
are hardware optimization for speeding up the common case.

2) Spatial and temporal locality of reference is also taken advantage in the
cache replacement policies to better utilize the cache.

3) Layering to manage complexity is also used in many places. One example is
the TCP/IP stack.

Summary
The paper describes a reimplementation of the Unix file system that is primarily aimed at increasing the throughput of the old file system. By increasing block size, by allocating blocks that exploit locality of reference and by grouping data that will be sequentially accessed, the throughput is improved.

Problem attempted
The objective of the paper is to substantially increase the throughput of the Unix file system by avoiding inefficiencies that are inherent in the allocation policies of the old file system.

Contributions

1. The new file system implements super-block replication to enhance protection of critical data. The way replicated data is stored is such that if a single platter fails no harm is done to the super-block data.

2. Expanding the block-size of the file system has been one of the primary means to increase throughput. In order to avoid the space wastage that arises out of large block size, the system supports fragmentation where each block of size 4 KB has four fragments of size 1KB. It is upto the user program to write full blocks at a time to minimize fragment reallocation.

3. By parametrizing processor capabilities, storage characteristics of disk and other such properties of underlying hardware, the new filesystem tailors itself to the disk on which it is placed. This is a great feature as storage characteristics of devices can be vastly different and optimizing w.r.t a specific device will yield substantial performance benefits

4. The new filesystem tries to cluster the inodes of the files in a single directory by placing them in a single cylinder group. It attempts to place data blocks of a file in the same cylinder group. At the same time it handles the conflicting demand to avoid filling a cylinder group fully. So it chooses a threshold file size of 48 KB to be stored in the same cylinder group.

Flaws

1. The files that are created when a file system is full will lack locality in the disk. The paper claims that once enough space is available, the data can be moved to achieve locality. But this entails a lot of copying and if this operation is tolerable, then one could claim that periodical reorganization of the disk to restore locality in the old file system is fine too.

2. The estimate of the number of inodes to be allocated to each cylinder group seems pessimistic. Allocating one inode for each 2 KB of space on a cylinder group will consume a significant fraction of the disk space. But if disk space is not a constraint, this is not a serious problem.

Trade-offs

1. The paper makes a trade-off between the space wastage that occurs by using a larger block size and the efficiency that comes when block size is large. It increases the block size and patches the space wastage problem by implementing fragments.

2. Another trade-off is between allocating data blocks of a file in the same cylinder group and not filling up a cylinder group completely. It puts a threshold on the amount of data from a single file that can be allocated on a single cylinder group.

Techniques used

1. Paremetrization is a technique widely used in many areas in CS. For example, compilers are optimized for a specific architecture.

2. Exploiting locality of reference - The memory hierarchy is a classic example of exploiting locality of reference.

SUMMARY
This paper describes the Fast File System, a modification of a basic UFS of the early 1980s to with the primary goal of improving performance.

PROBLEMS
UFS at the time was plagued by a number of issues, most significant are poor bandwidth utilization and poor portability to different drive characteristics (spindle speed, track size, etc.).

CONTRIBUTIONS / TECHNIQUES
Addressed the primary issues related to throughput: block size and locality. Increased block size, and at the same time created a mechanism for splitting blocks into multiple files in able to address what would otherwise have been a new problem of small files wasting lots of space.

Attempts to increase locality by keeping inodes, data blocks and accounting data in the same cylinder.

Allocates space for directory entries in 512 byte chunks, and supports names up to 256 characters long.

Adds file locking to the file system, introduces symbolic links that can link files across physical disks and adds quotas.

FLAWS
FFS is designed specifically for rotational hard disk drives – something that has persisted in most file system designs until recently. The changes to FFS may not show benefits for (flash)ram disks, file systems on RAID arrays or other types of storage medium.
Further, the performance analysis of the paper is poorly done. While the paper was comparing FFS to the old UFS which used 1024 byte block sizes the authors never made a direct comparison using their changes at the 1024 byte block size; they compared instead at 4k and 8k.
File locking in the FS is nice, but they don't address deadlocks – not that I can say they could within the narrow confines of a FS API.

OTHER APPLICATIONS
Methods for increasing locality are important for increasing cache coherency in a number of applications in which one deals with large sets of dynamic data. Adaptive block sizes are used in networking, where data is sent in varying packet sizes.

A FAST FILE SYSTEM FOR UNIX

SUMMARY
This paper introduces a modification in the Unix File System. These modifications include bigger block sizes that can be divided into equal size fragments, the introduction of cylinder groups and new allocation policies. These modifications do not require changes in the user interface. Later some modifications are proposed that improve the file system user interface: long file names, file locking, symbolic links and quotas.


PROBLEM
This research tries to improve the throughput and reliability of the current Unix File System. These modifications need to be introduced without changing the user interface. A second problem is later studied, what are the most necessary changes for the file system user interface.

CONTRIBUTIONS
They increase the block size in order to improve throughput, but to avoid high percentages of wasted space in disk they propose the possibility of allocating fixed fragments of the block. Bigger block sizes increase the transferred data per disk transaction. Having a bigger block size makes it possible to have bigger files with only two levels of indirection.

The cylinder groups are a way of dividing the disk. For each group there is a redundant copy of the superblock and there is a bit map of the free blocks. Having redundant copies of the superblock increases reliability. The bit map replaces the old free blocks list.

The size of the block and the fragments within the block are parameters that can be chosen when the file system is created, giving extra flexibility to the system. Some other parameters related to the processor and disk characteristics are used to improve the allocation policies and thus the file system performance.

The allocation policies try to increase locality and reduce seek time. Files in the same directory have its inode placed in the same cylinder group. The data blocks for the same file are placed at rotational optimal positions.


FLAWS
In order for these layout policies to work properly the disk needs to have some free space, for the performance a 10% of free disk is chosen. Is this the optimal number, how does this varies with the size of the disk. Why is a 4096/1024 structure preferred over others like 8192/512 or 4096/512? There is no explanation for these choices. They propose using file locking to synchronize multiple accesses to the same file, but they do not explain how deadlock is avoided and how it is solved when it happens.


PERFORMANCE
The increase of performance is obtained by increasing the block size to 4096 bytes, which produces higher data transfer per disk access. Having cylinder groups with a free block map reduces seek times. Allocation policies that increase locality and reduce seek time also contribute to the bandwidth increase. A tradeoff is to leave some unused space in the disk in order to be able to use the previously mention allocation policies: space is traded for throughput. Exploiting locality and locking are a very extended concepts in OS.

Summary:
This paper presents FFS (A Fast File System for UNIX), which improves upon the traditional UNIX file system by significant gain in file system throughput from 3-4% to 47% (almost 10x increase). The paper observes subtle yet shortcomings of traditional UFS like small block sizes, rigid allocation policies more or less independent of disk and processor characteristics. Further the paper proposes long-needed functional enhancements to the interface of UFS.

Problem:
The original Unix File System only utilized 3-4% of available disk bandwidth. This was mainly due to small block sizes, limited read-ahead, randomization of disk block placement and many disk seeks which limited the file system throughput. FFS presents a reimplementation, which provides substantially higher throughputs.

Contributions:
One of the major contributions of FFS is the flexible allocation policies used that allow better locality of reference and which can be adapted to a mass storage and processor characteristics. This is achieved by global and local allocation policies.
Secondly, clustering of related data (eg. inodes of files in a directory) helps to reduce the disk seeks needed for frequent access. Thereby improving the file system throughput.
Thirdly, the use of different block sizes and the use of fragments not allow faster access to larger files but also reduce the amount of waste for small files.
Last but not the least, FFS also proposes functional enhancements to the filesystem API, eg. file advisory locks, long file names and extension of name space across file systems.

Flaws:
I believe this paper was the first step which led to moulding the file system storage stack according to the characteristics of conventional hard disks (eg. long seek times, rotational latencies, etc). This hard-wiring has carried out for more than 2 decades. Only recently, people have started thinking of decoupling these assumptions from the design of a file-system due to the advent of novel disk and memory technologies (eg. flash disks). This shows how a virtue can turn into a vice in the long run.

Techniques Used:
Some of the major techniques used by FFS are exploiting locality of reference by clustering related data. Second important technique is a clean-slate architecture for filesystems which allows their parameterized configuration to match the characteristics of the underlying disk technologies and the processor (separation of policy and mechanism). Adaptive block sizes for different file sizes and advisory file locks are other tricks which add more meat to FFS.

Tradeoffs:
Batching of requests in data buffers and ordering w.r.t. minimizing disk seeks trades off with the throughput of writing large files with small files. Larger block sizes for large files and fragments for small files, show a tradeoff in space (wastage) and time.

Alternative Uses:
Clustering of related data is a well known technique used for caching (locality of reference). Furhther batching is also used for write back caches. Adaptive block sizes are also used for networking protocols for different buffer sizes with variable packet sizes (Maximum Transfer Unit for Ethernet).

Summary:
Paper describes Fast File System, which is reimplementation/enhancement to basic Unix File System. FFS provide better throughput by using flexible allocation policy.

Problem:
Existing Unix File System had following problem:
1. Organization of Unix file system segregates I-node info from data. These segregation causes long seek time.
2. Small block size combined with lack of locality severely limits the throughput. It was using 4% of Disk bandwidth.
3. Existing old FS ignore the parameter of underline hardware. Due to this old FS is very rigid and not able to adapt to different disk characteristic like rotation speed and number of blocks per track.

Work Summary/Contribution:
1. To improve the throughput disk is divided into cylinder groups. FFS tried to allocate file I-node that belongs to same directory in same cylinder group. It tries to put data block of file is also allocated in same cylinder group.
2. Block size is also increased to 4096 (or it can be set to more than 4096 ) so now more data can be accessed in one disk access. To reduce wastage of space for small file fragments of data block is introduced.
3. Copy of super block is maintained in cylinder group. This information is kept in spiral kind of structure, which make sure that one platter or track or cylinder failure doesn’t cause loss of all redundant copy of super block.
4. Free list is replaced with bit map (bloom filter). This helps in determining free block quickly.
5. FFS implement useful enhancement like long file name, Rename system call, Quotas for user, Symbolic link and File Locking.
6. FFS use two level of layout policy. Global layout policy determines the optimal placement of new directory and files in cylinder group. Local layout policy of each cylinder group determines the optimal placement of data block.

Flaws:
1. Still only 50% disk bandwidth is utilized
2. Reading data from FS require data copy from disk buffer in Kernel address space to data buffer in user space. This copying operation consumes almost 40% of time of I/O operation.
3. File locking doesn’t detect dead lock and there exist no deadlock prevention as well.

Tradeoff:
1. Space to keep file system Indexing information is traded off with space to keep track of available free block. Due to big block size less space is required to keep indexing information. Use of bloom filter requires more space for free block.
2. Space to keep free block is traded off with time. Now it is fast to determine if a block is free but needs more space.

Another part of OS where technique can be applied: :
1. Bitmap based structure to keep track of available block can be used in virtual Memory system to determine which particular page is free.
2. Quota limit (soft limit and Hard limit) on file system can be applied on other resources like sockets, Processor time and memory usage.

Summary:
Paper describes Fast File System, which is reimplementation/enhancement to basic Unix File System. FFS provide better throughput by using flexible allocation policy.

Problem:
Existing Unix File System had following problem:
1. Organization of Unix file system segregates I-node info from data. These segregation causes long seek time.
2. Small block size combined with lack of locality severely limits the throughput. It was using 4% of Disk bandwidth.
3. Existing old FS ignore the parameter of underline hardware. Due to this old FS is very rigid and not able to adapt to different disk characteristic like rotation speed and number of blocks per track.

Work Summary/Contribution:
1. To improve the throughput disk is divided into cylinder groups. FFS tried to allocate file I-node that belongs to same directory in same cylinder group. It tries to put data block of file is also allocated in same cylinder group.
2. Block size is also increased to 4096 (or it can be set to more than 4096 ) so now more data can be accessed in one disk access. To reduce wastage of space for small file fragments of data block is introduced.
3. Copy of super block is maintained in cylinder group. This information is kept in spiral kind of structure, which make sure that one platter or track or cylinder failure doesn’t cause loss of all redundant copy of super block.
4. Free list is replaced with bit map (bloom filter). This helps in determining free block quickly.
5. FFS implement useful enhancement like long file name, Rename system call, Quotas for user, Symbolic link and File Locking.
6. FFS use two level of layout policy. Global layout policy determines the optimal placement of new directory and files in cylinder group. Local layout policy of each cylinder group determines the optimal placement of data block.

Flaws:
1. Still only 50% disk bandwidth is utilized
2. Reading data from FS require data copy from disk buffer in Kernel address space to data buffer in user space. This copying operation consumes almost 40% of time of I/O operation.
3. File locking doesn’t detect dead lock and there exist no deadlock prevention as well.

Tradeoff:
1. Space to keep file system Indexing information is traded off with space to keep track of available free block. Due to big block size less space is required to keep indexing information. Use of bloom filter requires more space for free block.
2. Space to keep free block is traded off with time. Now it is fast to determine if a block is free but needs more space.

Another part of OS where technique can be applied: :
1. Bitmap based structure to keep track of available block can be used in virtual Memory system to determine which particular page is free.
2. Quota limit (soft limit and Hard limit) on file system can be applied on other resources like sockets, Processor time and memory usage.

Summary:
Paper describes Fast File System, which is reimplementation/enhancement to basic Unix File System. FFS provide better throughput by using flexible allocation policy.

Problem:
Existing Unix File System had following problem:
1. Organization of Unix file system segregates I-node info from data. These segregation causes long seek time.
2. Small block size combined with lack of locality severely limits the throughput. It was using 4% of Disk bandwidth.
3. Existing old FS ignore the parameter of underline hardware. Due to this old FS is very rigid and not able to adapt to different disk characteristic like rotation speed and number of blocks per track.

Work Summary/Contribution:
1. To improve the throughput disk is divided into cylinder groups. FFS tried to allocate file I-node that belongs to same directory in same cylinder group. It tries to put data block of file is also allocated in same cylinder group.
2. Block size is also increased to 4096 (or it can be set to more than 4096 ) so now more data can be accessed in one disk access. To reduce wastage of space for small file fragments of data block is introduced.
3. Copy of super block is maintained in cylinder group. This information is kept in spiral kind of structure, which make sure that one platter or track or cylinder failure doesn’t cause loss of all redundant copy of super block.
4. Free list is replaced with bit map (bloom filter). This helps in determining free block quickly.
5. FFS implement useful enhancement like long file name, Rename system call, Quotas for user, Symbolic link and File Locking.
6. FFS use two level of layout policy. Global layout policy determines the optimal placement of new directory and files in cylinder group. Local layout policy of each cylinder group determines the optimal placement of data block.

Flaws:
1. Still only 50% disk bandwidth is utilized
2. Reading data from FS require data copy from disk buffer in Kernel address space to data buffer in user space. This copying operation consumes almost 40% of time of I/O operation.
3. File locking doesn’t detect dead lock and there exist no deadlock prevention as well.

Tradeoff:
1. Space to keep file system Indexing information is traded off with space to keep track of available free block. Due to big block size less space is required to keep indexing information. Use of bloom filter requires more space for free block.
2. Space to keep free block is traded off with time. Now it is fast to determine if a block is free but needs more space.

Another part of OS where technique can be applied: :
1. Bitmap based structure to keep track of available block can be used in virtual Memory system to determine which particular page is free.
2. Quota limit (soft limit and Hard limit) on file system can be applied on other resources like sockets, Processor time and memory usage.

PS: As this paper is enhancement of existing FS so I felt having detail problem description is appropriate.

Summary: The paper describes the Berkeley Fast File System for UNIX. Specific aspects discussed in the paper include how the designers address the problems of improving throughput, reducing space wastage and accomodating diversity in hardware. The paper also dicusses some enhancements to the traditional filesystem, viz, file locks, long filenames, symbolic links, quotas, and a rename system call.

Problem addressed There are two major problems being addressed through this paper. The first is that of disk throughput. It was observed that the traditional filesystem exploits only a meagre 2% of the disk throughput. FFS is designed to improve upon this figure by means of larger block sizes, design with focus on locality of storage of data as well as inodes etc. The second problem is that of hardware diversity. This included designs systems with and without I/O channels, differing disk technologies etc. The authors drive to build a filesystem which could be fine-tuned to the particular hardware available.

Contributions
- the first contribution is the use of large block sizes (8 / 16 sectors). By reading a block which is more than one sector in size, the seek overhead is reduced to 1/N per sector (N = number of sectors in the block)
- the idea of using two sizes, block and fragments, for storing data helps reduce space wastage arising from larger block sizes.
- Division of the disk partition into cylinder groups is a proactive strategy ensuring locality of storage of even writes that happen in the future. The mechanism reduces the necessity for a defragmenting tool.
- linearly offsetting cylindergroup bookkeeping information to avoid accumulation of all info on the same platter, helps make the filesystem more tolerant to failures. Superblock replication is another cool idea which improves recoverability.
- parameterization helps fine-tune the filesystem to perform better given the hardware available. At the same time, some of these parameters can be modified at later times to allow migrating the filesystem/disk.
- splitting the layout policy into global and local layout policies makes each of them simpler to implement as they can be implemented with partial knowledge.

Flaws
- The static allocation of a huge number of inodes per cylinder group seems wasteful. Although the idea is to improve locality of inodes, it might have been less wasteful in terms of space to allocate some inodes statically, and thereafter allocate on demand. (I believe with an ext3 file system, this static allocation eats up 6.5 gb off a 250 gb partition)
- The discussion in section 5.1 on long filenames says directories are allocated in 512 byte units called chunks. I find this inconsisten because directories are also a type of files and hence consistency dictates that they should also be allocated in blocks rather than sectors.
- Given the variable length filenames, deletion of directory entries potentially causes fragmentation. Returning the space so freed up to a previous entry in the same chunk doesn't fully mitigate the problem.

Techniques used
1. The first technique used is economies of scale. By bulking the read/write into a group of sectors called blocks, rather than writing/reading individual sectors, the seek latency (constant overhead of processing) is distributed among a larger set of entities.
2. The second technique is the use of configurable parameters, which forms a sort of separation of mechanism and policy.
Tradeoffs
The main tradeoff with economies of scale achieved by increasing the block size is one of disk throughput vs. processor and bookkeeping overhead. Tradeoff with using configurable parameters is one of performance vs. portability.
Other places where the same technique can be used.
- Economies of scale is used with prefetching, and with preallocation of blocks when a file is noted to be growing rapidly.
- use of configurable parameters can benefit any type of scheduling requirement like processor, I/O, network etc.

.

In this paper the authors present a new, faster filesystem for UNIX. They report a tenfold increase in bandwidth utilization, better space efficiency and numerous interface enhancements.

I identify the following four contributions: 1. A scheme for efficient storage of both large and small files, which avoids freelist fragmentation and maximizes utilization. This is achieved by partitioning blocks into fragments, and allowing multiple small files to reside in one block. 2. The idea of cylinder groups, whose careful selection can tolerate several different types of mechanical failures. 3. The ability of the filesystem to adapt to the hardware it is running on by measuring system-specific properties such as blocks per track and disk spin rate. 4. The new placement strategy which seeks to minimize seek latency by exploiting locality. 5. The application interface enhancements, which have now become an integral part of all UNIX filesystems.

The performance evaluation section seems to be an afterthought. I believe that this paper is a pedagogical example on how NOT to create experiments to evaluate your system: The authors only measure the throughput of sequential reads and writes when it is clear that their system is much better at other workloads. For example, they argue that the each instance of the old file system was getting progressively slower over time due to fragmentation. While they claim to solve this problem, they give neither theoretical nor experimental evidence of how good their solution is. A more careful evaluation would demonstrate which design decisions improved performance and which did not help. Finally, it now seems strange that they allowed automatic parameterization but did not separate the block layout mechanism from the policy. This would have allowed the same file system to remain efficient for devices with different physical characteristics (eg. tape, solid state disks).

An interesting design tradeoff is the file system ability to manipulate fragments. This comes at significant bookkeeping cost, but the authors claim that empirical evidence shows to achieve much better space utilization. However, 25 years later, disk capacity has improved a thousand times while disk speed has not changed much. A re-evaluation of this block organization strategy today will probably reach the conclusion that manipulating fragments trades processing power and precious disk bandwidth for cheap disk space. In retrospective, a better evaluation might have been able to show which ideas are timeless and which decisions were motivated by technology trends of the 80's and no longer apply.

Summary

The paper Fast File Systems describes changes to the original UNIX file system to improve the throughput over time as well as add new enhancements including quotas and symbolic links.

Problem trying to Solve

The original UNIX file system was designed to be simple, and did not take into account hard disk design which can have great performance benefits if the disk rotation and head movement are designed for. The reason this became an issue is that many applications required higher throughput rates, such as VLSI design, image processing, and file mapping into virtual memory. In addition as more changes occurred with the file system, the throughput worsened.

Contributions

-Incorporating many functional enhancements into the UNIX file system: Arbitrary length file names, improved file locking, symbolic links which are valuable tools for linking across file system boundaries, rename system call for safer renaming, disk quotas for controlling user space.

-Improved reliability with disk errors due to the superblock being duplicated across the disk on different cylinders.

-Implementing file system features based on the common hard disk design. This includes rotational layout tables for fast indexing, cylinder groups for locality and performance, fragmenting of blocks for better utilization, and fast allocator algorithm - quadratic hashing - for finding more optimal locations for allocating blocks.

Flaws

The Fast file system is tailored specifically for rotational hard disks. These enhancements would not have as much benefit with solid state storage, RAM file systems, or other non-cylinder based systems.

The performance comparison takes the old file system at 1024 block size, while the new is always compared at either 4096 or 8192 block sizes. In order to understand the performance loss due to the increased structures, a comparison with the new file system also at 1024 would be informative for a true understanding of how much benefit the locality and grouping makes.

Techniques

The main technique used in this paper is the idea that a rotational hard disk has performance limitations that can be reduced by creating specialized organization and design. This more complex system helps improve the throughput by considering the common cases and using algorithms to find the best location for data. The trade-off is improved performance for increased complexity. The original UNIX file system was very generalized, while the fast file system is very tightly coupled with a rotational hard disk design. This technique could be used with messaging such as network data. In order to improve the performance of a low throughput connection with large data packets, complex compression algorithms can be designed.

Summary: The paper presents a reimplementation of Unix FS that significantly increases the FS throughput by favoring disk-locality, by using larger blocks divided into fragments, and by parameterizing system characteristics. They also add new functionality still in use today.

Problem: The old FS transfered only 512 bytes per disk transaction and often logically sequential data was not on the same cylinder. The segregation of inodes from data guaranteed a long seek between accessing an inode and its corresponding data. The resulting FS throughput was less than 5% of theoretical maximum. Simply increasing the block size was not feasible, due to locality deterioration over time and fragmentation. In addition, the old FS lacked useful functionality that currently exists on Unix systems.

Contributions: The paper provides an insightful analysis of performance factors for FS. The FS is reorganized to favor locality, in particular so that inodes are no longer required to be separated from their data. They recognized that there was a conflict between the dual nature of a block as the unit of disk transfer (desired to be large) and as the unit of addressability/fragmentation (desired to be small). They solved this by proposing a flexible structure with large blocks divided into fragments which can belong to different files. Finally, thy recognize that systems have various HW characteristics and introduce a set of parameters that allow FS customization. All these are done in a manner compatible with the old system interface.
On the functional side, they introduce longer file names, file locking, symbolic links, and disk usage accounting.
Finally, the new FS uses redundancy and dispersion of metadata for increased reliability.

(more) Techniques: The main principle is accounting for HW realities, and giving up the old rigid "one size/format fits all" structure. Characteristics such as the "rotational layout table" are stored in the superblock and some can even be changed while the system is active. Access locality is greatly improved by closeness of cylinders in a cylinder group. Also separation of mechanism from policy is employed by offering the flexibility for inode placement, this is used to keep the inodes close to their data with distinct policies for files and directories.

Tradeoffs: There is a tradeoff between the CPU usage and disk throughput. More space is needed for bookkeeping. They consider it is OK to seek for large files. Use of suboptimal but fast Global/Local allocation policies, as well as hashing for free space management.
Coexistence of various sized FS proned the extension of the interface to expose the optimal R/W size.

Weaknesses: I consider the 5-10% free space requirement a hard to avoid weakness. The OS reasons about disk access times, but HW has it's own buffering. Would be interesting to know how does the average file size and percent of free space affect the performance tables. CPU utilization can get quite high. Apparently the kernel buffers are still a limiting factor.

Summary:
This paper introduces the reimplementation made to the old UNIX file system, with aiming to better throughput and reliability. The authors start by identifying the flaws of the old file system. Then they talk in detail about their new approaches, discussing the impact that they will have in practice. Finally, taking advantage of the fact that the old file system will have to be dumped and rebuilt, they also introduce five new functional enhancements, such as file locking, quotas, rename, long file names and symbolic links.
The goal the paper was trying to deal with:
The old UNIX file system has a lot of flaws: (1) Low throughput and long seek times; (2) Corruption on one superblock destroys the file system, data read size per transaction is not optimal; (3) Although initially free list was ordered for optimal access, they are got quickly scrambled as files were created and removed. (4) Transfer rates are deteriorated because of randomization of data block placement. New file system was mainly aimed at solving these problems.
Contributions
1. The new file system divides a disk partition into one or more areas called cylinder groups. Each cylinder group has bookkeeping information redundant copy of super blockspace for inodes bit map. Redundant information spirals down into the pack so that data on any single track, cylinder or platter can be lost without losing all copies of the super block.
2. Fast File System optimizes reads data laid out such that larger blocks can be transferred in a single disk transaction greatly increasing file system throughput.
3. File system parameterization: to parameterize the processor capabilities and mass storage characteristics, blocks can be allocated in an optimum configuration dependent way.
4. Important functional enhancements are done by changing internal data structures and adding system calls, with backward compatibility. The addition of symbolic links, which I consider the most important one among them, makes the FS a lot more flexible.
Flaws:
Regarding to advisory file locks, "Almost no deadlock detection is attempted." No means of timeout or custom deadlock detector are discussed. What's more, the locking interface is blocking. So user applications will have neither ways to tell whether deadlocks have happened nor ways to get out of them. Another flaw is that measurements are derived from a single installation.
The techniques used to achieve performance:
The performance results that they include in the paper are quite impressive, but also expected. The use of a bigger block size in addition to the fragmentation of the blocks increases the I/O bandwidth's utilization, without wasting valuable disk space. Also, the ability to change most of the file system parameters (especially the "hardware" oriented ones) on-line and without having to dumb and rebuilt the whole thing, greatly increases the flexibility and the portability of the system. They also introduce five new functional enhancements, such as file locking, quotas, rename, and long file names and symbolic links.
Tradeoffs:
(1) The main tradeoff in this paper is that having large block sizes makes the improvement of throughput but wastes space since the size of a lot of files are small. Using fragmentation technique can improve the utilization of space; however, it involves a significant overhead.
(2) In the Fast File System FFS, locations of files and metadata are fixed, and when data is modified they are always overwritten in their original location. With such problem, if geometrically dispersed data are modified within the same time span, seek and rotational delay substantially influences disk I/O time resulting in underutilization of disk I/O bandwidth. In order to fully utilize disk bandwidth for write requests, all modified data can be collected in memory chunks, then written to disk together.
Another part of the OS where the technique could be applied:
(1) The technique of File locking used in this paper can also used by other systems, like Microsoft Windows. (2) Symbolic links: Windows Vista supports symbolic links for both files and directories with the command line utility mklink. Unlike junction points, a symbolic link can also point to a file or remote SMB network path. Additionally, the NTFS symbolic link implementation provides full support for cross-filesystem links. (3) Long file names: long filenames on FAT was implemented on Windows NT 3.5 in 1994.

Introduction:
This paper introduces the main ideas behind the Berkeley Fast File System, an improvement over the original Unix File System. The authors describe a way to utilize the locality of different pieces of data to increase throughput. They also describe how the new file system is more reliable, and also how some features missing in the UFS were implemented in FFS.

What were they trying to solve:
The traditional Unix File System read and wrote data in small chunks of 512 bytes, which meant that even though disks were capable of much higher bandwidth, the actual disk throughput was very low. The block allocation to files also didn't follow any pattern, so over time, various blocks of a file would be spread across the disk and severely increase the seek delay. There was also no strategy for handling disk failures, as there was only a single copy of the superblock. It was also oblivious to the underlying hardware.

Contributions:
Most of the throughput issues were caused by having too small a block size. FFS increased this to 2K/4K/8K, thus increasing the throughput considerably. But this large blocksize comes at a cost, since a large amount of space is wasted as the system has to allocate a whole block and the application may use only a small fraction of it. FFS solves this by dividing up the block into chunks called fragments of size 512/1024 bytes. The file system takes care of allocating fragments or blocks when the need arises.

FFS divides the disk into cylinder groups, which is associated with one or more cylinders on the disk. Each group keeps its own set of inodes, data blocks, accounting information and a redundant copy of the superblock for reliability.

FFS takes advantage of locality in block accesses to minimize seek costs. For example, inodes of files under a directory are stored in the same cylinder group.

FFS optimizes directory storage by allocating space for directory entries in 512 byte chunks. They also support names up to 256 characters long.

Advisory locking is provided. Open files can be locked in shared or exclusive modes.

Symbolic links are introduced for linking files across file systems. A separate rename systemcall is provided to ensure consistent renaming of files and directories. Quotas for disks is also a new feature introduced.

Various parameters like processor speed and disk geometry are not hardcoded in the implementation, but are provided by configuration options set during filesystem creation.

Flaws:
The number of inodes in a cylinder group in a static number, which is not a good idea when the number of files is too high(run out of inodes) or when file sizes are too high(inodes occupy space which cannot be used by data).

The locking implementation seems like it was tacked on. On one hand, it is not mandatory so a non-cooperating process can simply not acquire any locks. The deadlock detection is also lacking, since it can only do single-process deadlock detection.

The rationale behind directory entries stored in chunks is not fully explained. The organization can lead to a lot of holes when files under a directory are deleted frequently.

Techniques used to improve performance
One technique used here is a hierarchical organization of data. Here cylinder groups/blocks/fragments form a 3 level hierarchy, which achieves the right compromise between speed of access and wasting of space.
Separation of mechanism/policy is evident from the configurable parameters which the file system uses to optimize block allocation.
Tradeoffs: The main tradeoff here has a time-vs-space flavor to it. Having large block sizes makes accesses faster but wastes space. Trying to utilize space well involves a significant overhead.
another part of the OS where the technique could be applied:
We have already hierarchical organization of virtual memory in Pilot, in the form of spaces.

Summary
In this paper, the authors present a re-implementation of the Unix file system with higher throughput rates without wasting too much space for this improvement. They increase throughput by increasing the block size and locality but introduce fragments to prevent space wastage for small files.

Problem
The Unix file system at that time(called as Old File System) had various flaws such as a very small block size(512 bytes) which increased the disk read latency even if the blocks were sequential. Also, since the block allocation didn't take special measures to ensure locality, the blocks in the same file or inodes of files in the same directory were scattered around. This resulted in long seeks for the most common operations and reduced the effective disk bandwidth.

Contributions
- They increased the block size to 4096 bytes(followed till date) from 512 bytes. This increased the throughput for reading huge files especially in the absence of IO controller where each read was followed by an interrupt.
- Increasing the block size, increases the wasted space for small files or files that just cross the block boundary. To handle such cases, they fragmented the last block of a file into 512/1024 byte fragments and only used the required number of fragments.
- They replicated the superblocks across cylinder groups to avoid losing the entire file system if the superblock gets corrupted. Also these were replicated across various platters and cylinders.
- The flexibility to configure the various parameters of the file system allows it to make it well-informed decisions such as the minimum distance between two contiguous blocks. If the decision was made implicitly, migrating disks would cause drop in the performance.
- The separation of the global and local layout policies reduces the complexities of both these layers and provide some kind of modularity in the design.(similar to THE)

Flaws
- Each cylinder group has a fixed number of inodes(one per 2048 bytes) which ensures locality of inodes and enough inodes to handle common file sizes. But this also wastes a lot of space which cant be reclaimed nor can the number of inodes be re-configured.
- Writes in this system(especially unbuffered ones) would tend to be very slow as there are a lot of checks to see if you have crossed the block/fragment boundary and if you do cross, finding the next block can be non-trivial and may involve extra copying of data.

Techniques
- It uses the "Economy of scale" to increase the throughput by increasing the block size for reads/writes. It also uses layering of the global/local policy for block allocation.
- There is a trade-off between space(for book-keeping/free space reserve) and increased read throughput. The write throughput also suffers a bit.
- The configurable parameters and global/local policies can be used by the scheduling subsystem(specially in micro-kernel systems)

Summary
This paper describes reimplementation of file system for UNIX which provides substantially higher throughput rates by using flexible allocation policy, better locality of data and adaptability to peripheral and processor characteristics. Features like advisory locks on files, extension of name space across file systems, long file names and administrative control of resource usage have been provided.

Problem trying to solve
The original 512-byte UNIX file system only utilized 2% of the 20 Kbps of disk bandwidth per arm. The segregation of inode information from the data incurred a long seek from file’s inode to its data. The combination of small block size, limited read-ahead in the system, and many seeks severely limited file system throughput. Another problem was that initially free list was ordered for optimal access; it quickly became scrambled as files were created and destroyed. This highly deteriorated the transfer rate because of increased seeks in reallocations. Corruption of superblock led to failure of entire file system. This file system ignored characteristics of underlying hardware, allocating blocks in non-optimal way.

Contributions
• The data is laid out in larger blocks of 4096 bytes, greatly increasing the file system throughput. To minimize the wastage of space for smaller sized files, the block is further divided into fragments which can be allocated to a file as needed. The new file system uses less space for indexing larger files because of larger blocks.
• The new file system divided the disk partition into cylinder groups. For mass storage such as disk, the new file system tried to allocate blocks optimally, so that rotationally they are well positioned.
• The superblock is replicated and copied into each cylinder group to protect against the catastrophic loss.
• The global allocation policies provide optimal data locality. They spread out unrelated data to different cylinder groups while keeps related data in the same cylinder group, reducing the seek time tremendously. The global allocation calls local policies to allocated blocks within a cylinder group.
• The paper also includes performance enhancements for long file names, file advisory locks, inter machine symbolic links, and rename files and quota restriction on users.


Flaws
• The paper does offer an experiment to compare read/write latency for both the file systems. Probably, they should have included experimental results from fragmentation of blocks and wasted space, if any. Additionally, they do not mention anything about the CPU figures (95%) in experiments.
• FFS choose preallocating inodes for each 2048 byte of data. There appears to be no basis for that.
• They do not provide any kind of deadlock detection for file advisory locks. This could lead to simple deadlocks.

Tradeoffs made:
The tradeoff is made in complexity of logic for improving the performance of file system. The new file system is complex, involves several allocation techniques and requires many data structures as compared to simpler approach of traditional unix file system. Because of extra work required by kernel, write operation takes higher time than read on file. But this tradeoff is made to achieve higher throughout.

The idea of improving throughput by using bigger block size can be applied to Network interface. The process posts a buffer to network driver, where the responses are copied and send back to process. Bigger buffer size can be used to provide several responses at once, then one by one.
The quota mechanism to restrict file usage can be applied to other resources as well, e.g. CPU, memory etc.

Summary: The authors of this paper present a new file system for the UNIX platform with the intent of creating a file system structure that can utilize more than 2% of the disk's available bandwidth. By increasing disk throughput they also increased performance of the system, and they introduce many new mechanisms which make the system more practical for reading and writing files in multi-user systems.
Problem to Solve: The main problem that the authors are trying to solve is that the current file system has a low throughput rate. As faster hardware was being developed and there was an increased use in virtual address spaces, the file system was starting to degrade performance in ways it previously had not because of the need to page data in and out quicker. Also users were seeing high seek times between reading blocks of data because free lists of blocks often ended up being randomly ordered so files would end up being on different disk tracks.
Contributions: First, they change the block size to 4096 bytes to achieve better reading performance. They realize with this change though that small files can result in a lot of wasted space so they come up with a mechanism for creating 'fragments' to hold smaller files. These two mechanisms combined lead to increased reading and writing speed without the tradeoff of wasted space. Secondly, they develop two different locking mechanisms for files: hard or advisory locks. These locks help create protection for files when they are open or in use by another program. This can cause spinning on a lock when a program is attempting to open a file, but it introduce a much need mechanism for controlling updates to files. Thirdly, they introduce the notion of symbolic links to files. Although something similar had been introduced in Multics, they wanted to be able to reference files that were located on different physical file systems. Next, they introduced the idea of placing quotas on individual users. For the first time, system administrators had a way to control how much space individual users used in a system enforced way. Also, this helped ensure that part of the file system could be set aside for system programs and utilities that needed to be installed without fear of the space being used up by users. They also introduced cylinder groups to replace free lists. This prevented free lists from being garbled with free spaces all over the system to be used and instead provided the system with free blocks that were within a few blocks on the disk so that they would be quick to read and write from.
Flaws: One flaw of this paper is in the performance evaluation. They do a good job explain the experiments but fail to say if the rates presented in table 2a and 2b are an average, best or worst case numbers. They do explain they get the experiment into a stable state before taking them, but not if this is repeated or just taken once. Another thing to note about performance is that they include the CPU percentage in the tables but don't explain what importance this number has on the results or how it is measured.
What tradeoff is made: One tradeoff that is made is that when they introduce file locking, but they don't introduce a solid mechanism for preventing deadlocks. So although you have this valuable ability lock a file, the system could go to a state where it spins constantly. Another tradeoff that is made is that although they took the time to make all of these changes to the file system they still are only able to achieve roughly 50% of the disk bandwidth utilization, but they do get a lot better performance over the previous system.

Summary
In “A Fast File System for UNIX,” Marshall K. McKusick et al. describe an implementation of a better performing file system for UNIX, noting why the previous file system performance suffered and how they addressed these issues.

Problem
The “old UNIX file system” (which was actually already improvement over the original implementation) was designed in a way that did not let it achieve maximum performance. Theoretically, a disk could transfer data at a rate as fast as that data passed under the read head of the disk. However, this ideal was not achieved because of a variety of reasons, among the most prominent being that the data was actually stored in blocks which did not necessarily lay consecutively on the disk. This being the case, as a file was accessed, the blocks pertaining to that file would have to be found before a transfer could occur. The overhead of finding data was so great that measurements showed that the file system could provide as little as three percent of the maximum disk bandwidth in some situations.

Contributions
• Devising a method by which data would be stored in larger blocks, allowing faster transfer of large files due to a lower proportion of time spent seeking, while at the same time allowing for separate data to be merged into a single block, preventing waste of space.
• Employing modifiable file system parameters to allow for the file system to be optimized based on these parameters, for example, by placing blocks either consecutively or a certain number of blocks apart so as to conform to the expected delay between blocks that would logically be fetched together.
• Taking into account the possibility of failure and attempting to minimize the problem by, for example, replicating the superblock in various parts of the disk.

Flaws
• Using parameters for the hard drives may fail to take into account more radical hard drive improvements or different forms of storage. For example, solid state drives that are available now do not have rotating discs at all.
• Preallocating inodes—one for every 2048 bytes. Though this improves the performance by neglecting later allocation, having a specific number at creation time may not do well if files are exceedingly small (and more inodes are thus needed) or exceedingly large (thus having the preallocated inodes waste space).

Performance Techniques
Among the performance tradeoffs that the fast file system makes, a tradeoff is made in choosing to use larger blocks to store data so that files can be more efficiently be transferred with the cost being that more space may be wasted in partially filled blocks. By necessitating fewer blocks be read or written, less time is wasted finding available blocks. This same general idea can (and is) applied with TCP window sizes, whereby for sufficiently fast network connections, the TCP buffer is expanded so that more data can be received before a server requires an acknowledgement to continue sending additional data.

Post a comment