Paper Discussion - Fall 2008 - CS 736: Scale and Performance in a Distributed File System

« The Zettabyte File System | Main | The Google File System »

Scale and Performance in a Distributed File System

John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Stayanayanan, Robert N. Sidebotham, and Michael J. West. Scale and Performance in a Distributed File System. ACM Trans. on Computer Systems 6(1), February 1988, pp. 51-81.

Reviews due Thursday, 11/13.

Posted by Michael Swift on November 10, 2008 08:53 PM | Permalink

Comments

The Problem

John Howard, et al. describe a variety of scalability problems in AFS. The make some improvements and compare performance to Sun's NFS.

The Solution

They adopt a new philosophy: whenever still secure and efficient, make clients do the work.

First and foremost, caching of files and directories was used on the clients to reduce message traffic. File operations outside of open() and close() are made transparent to the client-server architecture, reducing software complexity.

Another improvement to reduce server-side CPU usage: clients were used to compute an Fid from a filename, file path, and inode # to help the server quickly lookup the location of stored data.

A number of server-side improvements were made. Most notably, threads were spawned instead of processes in an effort to reduce context-switching, memory, and task creation overhead.

Trade-offs

Caching an entire file on the client side does generally help reduce the amount of message traffic involved in file IOs; after opening a file, a client does not have to send messages over and over as it reads and/or writes to it.

That said, keeping the caches up to date increases the complexity of the system. Now the server has to maintain state about each client, so when a new file is updated by a client all other clients can be notified the change. Also, if client X is waiting for changes made to a file by client Y, X must wait for Y to finish all its writes before it can make decisions based upon its input. This makes it harder to pipeline data processing within a network within the scope of a file.

Other Thoughts

File caches persist between reboots, which requires clients to resynchronize their data with the network. This could potentially be a costly operation if many changes were made to files in the cache during the downtime.

I would have enjoyed more discussion about what occurs when two clients both request the same copy of a file and then proceed to edit it at the same time. Is sensible default behavior a lost update?

Posted by: James Jolly | November 13, 2008 09:12 AM

Scale and Performance in a Distributed File System

Summary

The authors perform measurements on a prototype Andrew File System
(AFS) to identify scaling & administration bottlenecks. They realize that
their design approach is correct and only that the implementation needs
improvements. So,they enhance the implementation of the prototype to address
these bottlenecks. They finally perform measurements to validate the
enhancements made.

Description of the problem being solved

The authors are trying to address scaling and administration issues in
the prototype implementation of the AFS.

Contributions of the paper

a) The basic approach of caching whole files is very different from other
implementations like NFS but also effective. It is innovative because they
realized that most of the accesses to file access the whole file and cite
Ousterhaut et al's literature on this.

b) Changing from polling based approach to event notification approach for
cache validation is a contribution. callbacks reduce the load on the server
compared to on periodic cache validation requests.

c) Using File identifiers to directly locate the Vnode and caching of path
names to file identifiers is much better in performance that path translations
every time.

d) Using volumes as the basic unit for quotas and replication for maintenance
is a nice idea.

Flaws in the paper

a) Caching entire files leads to poor performance during reads that do not
access the whole file.

b) Diskless clients are not supported. Also, files larger than local client
storage can never be accessed because of whole file caching restriction.

c) Strict emulation of 4.2BSD concurrent read / write semantics across
workstations is impossible since only open & close system calls are
intercepted and read / write system calls are not intercepted.

Techniques used to achieve performance

1) Event based notification instead of polling to improve performance -
Callbacks.

2) Spatial locality of file accesses is the basis against whole file caching
approach.

3) Indirection to solve problems: The have used the volume indirection concept
to solve the administrative movement of data to distribute load amongst
servers.

Tradeoff made

1) Space vs Time :

a) More state in server with Callbacks to improve scalability: Generally,
more state in server means less scalability but here since the performance
gains improves scalability more than centralized state degrades , this
tradeoff has been applied.

b) Using caches to improve performance.

Another part of OS where this technique could be applied

1) All file systems use locality of reference. 2) Event based notification
instead of polling is the basis of interrupt and is also found in many
distributed systems like message queues etc.

Posted by: Leo Prasath Arulraj | November 13, 2008 08:20 AM

Summary:
The authors present AFS in this paper, with scalability and performance as their primary concerns. Their implementation and performance analysis is based on a prototype as well as a working distributed file-system deployed over more than 5000 workstations in CMU in 1988. Certain important novel features of the paper are related to cache management, name translation and volumes.

Problem:
This paper presents AFS, a distributed file system. The authors particularly focus on scalability and performance problems of a distributed FS. AFS differs significantly from its prototype in terms of cache management, name translation and low-level storage representation. These changes help AFS in achieving better performance and scalability.

Contributions:
AFS (in 1988) presented a new style of distributed FS, unlike the popular NFS. One major contribution of this paper is in the direction of cache management. Validating local caches only on file open and close is based on the observation that most files in UNIX are read in entirety. Hence it substantially reduces the network traffic for each read and write, therby helping AFS to scale well. Secondly, name translation is transparent to the server because it only knows of the fids and the client (Venus) does all pathname translation. This frees the server from time-consuming namei calls. Thirdly, aggregation of files in storage volumes help and usage of location database servers help in moving user files seamlessly from one server to another. All of these contributions help AFS in achieving better performance and scalability as compared to its prototype implementation.

Flaws:
In AFS, cache validation occurs only on file open and close. However it is certainly possible for multiple clients to be accessing the same file at a time. How are concurrent changes to the file updated is not clear?
Secondly Venus is a user level process and there is a high overhead (both in terms of number of copies and number of context switches) associated with intercepting all file open/close calls and forwarding them to Venus. A kernel-level implementation, like NFS could do much better.

Techniques used:
Optimizing for the common case - most files are read in entirety. This observation forms the basis of cache management technique in AFS. Locality of file references by typical users makes caching attractive by reducing the high latency network traffic.

Tradeoff:
Tradeoff is made between lazy cache validations and reduced network traffic. By constraining the cache validations to file open and close only, network traffic is substantially reduced but at the same time it raises issues for concurrent file access.

Alternative uses:
Today AFS is widely used, even in our department too (unfortunately ;)). Issues related to scalability raised in this paper are highly perinent for any distributed system (even in the context of P2P file transfer systems like BitTorrent today). Caching based on heuristics is getting popular too today.

Posted by: Mohit Saxena | November 13, 2008 02:48 AM

In this paper, the authors present the Andrew file system, a distributed file system developed in CMU. More specifically, they discuss some strategies to improve its scalability. Firstly, they present the problems that the system had and then the improvements they’ve made.

The existing distributed file systems (including the prototype AFS) couldn’t scale very well when the number of nodes was more than 1000. The authors identified the reasons why AFS didn’t have a good performance in this case and proposed some modifications which would improve its scalability.

AFS used caches to store the file in the workstation so that it can be treated locally. The authors preserved this mechanism, as it reduces the number of cross-network calls, but decided to cache not only the files but also the contents of directories and the symbolic links. Moreover, they modified the cache management of AFS by assuming that the caches are valid unless otherwise notified. More specifically, when a workstation caches a file or directory the server promises to notify it before allowing a modification by another workstation (callback). This mechanism reduces the number of cache validation requests received by servers and as result it can improve the performance. Moreover, they reintroduced the notion of two-level names. Each file or directory is identified by a fid. By using the fid, the actual accessing of file data can be done efficiently. Another contribution of the system is the concept of volumes. Each volume is a collection of files which form a partial subtree of the name space. They are very beneficial as they help balancing the disk space.

In my opinion one flaw of the system is that it doesn’t have a general protection mechanism. For example, multiple workstations can modify a file concurrently. To ensure serializability, application programs have to cooperate and perform the necessary synchronization. Is this efficient? I believe that the system should provide a mechanism to ensure protection. Another flaw of the system is that when a volume is moved, a copy of it is created and shipped to the new site. In my opinion this approach has two disadvantages. Firstly, we consume space in order to store the copy of the volume, which may consist of several large files and secondly shipping large amount of data is not very efficient. Finally, the authors don’t specify what the system does when a file is very large to fit in the cache.

The main techniques used is the caching of files in order to be treated locally, the use of callbacks to avoid many checks, the creation of volumes to balance disk space and facilitate the management of files, the use of LWPs to reduce context switching between processes, the creation of fids to access data more easily and efficiently and replication of read-only files. There is a tradeoff between space and performance (e.g maintain callback states , create a clone of a volume). Moreover, this system is more complex to maintain than the prototype AFS.

Posted by: Avrilia Floratou | November 13, 2008 01:03 AM

Summary
The paper describes the design decicions adopted by the Andrew File System that enables it to avoid significant degradation in performance even in the presence of thousands of users. Caching full files in local workstations, employing lightweight processes in servers and maintaining stub directories that enable all servers to map a file to its corresponding server are among the prominent techniques employed in the AFS to achieve scalability.

Problem attempted

The authors observe that large scale in a distributed system degrades performance and makes administration very complicated. One of the primary goals in designing AFS is to make it scale gracefully and cope with these two major concerns.

Contributions

1) Each server maintains a directory hierarchy that maintains the structure of files stored in it and also maintains stub directories that maintains the structure of files stored in other servers. This enabled each server to identify the server containing any given file.

2) A local process (Venus) caches files from Vice (the server cluster) so that reading and writing of files can be performed in this cached copy. Thus, apart from opening and closing the file, the server need not be contacted. This feature helps in events of server crash or network outage where the workstation can still manage with limited server access for quite some time.

3) By making the clients perform the logical equivalent of the very frequently used namei operation (mapping pathnames to Fid), the AFS relieves the servers from a significant burden.

4) In order to minimize the context switching costs between different processes in a server, light weight processes are used to serve clients. Thus there is a single process to serve all clients in a server and there are a fixed numbe of LWPs each associated with a client.

Flaws

1) Since call-backs are discarded after a time-out or network failure, immediately after the server resumes, it will receive a huge number of cache-validation requests.

2) When very large files are opened, the user has to wait till the whole file is transmitted over the network.

3) Over a period of time, a single client will have a large number of directories and files with callbacks on them. This will increase the load on the server to callback the client for changes to any one of the numerous files/directories with callbacks

Tradeoffs

1) There is a tradeoff between the expensive initial access of a file and much cheaper later accesses. The requirement that a cache copy of a file be created first makes the first access expensive. But at the same time, it makes the later accesses much cheaper.

Techniques used

1) Locality - Locality of file access patterns is exploited by caching entire files in local workstations

2) Make the common case fast - The paper observes that cache validation requests are among the most frequently requested services from server and hence introduces callbacks to reduce network traffic and loads on servers. Similarly, by making the clients perform the equivalent of namei operation, the speed of this very frequent operation has been increased significantly.

Posted by: Balasubramanian Sivan | November 13, 2008 12:55 AM

Summary: The paper draws insight from benchmarking a prototype and describes the changes required for massive scalability in the distributed Andrew File System. Use of threads instead of processes, callback based cache consistency, and moving the computation from server to client resulted in a system that compares favorably to Sun's NFS, while use of volumes simplifies administration.

Problem: The authors' goal was to scale AFS up to 10000 nodes. To understand the issues, they built a quick prototype and analyzed it. This unveiled a number of problems in both performance and administration:
- Servers were CPU bound and quickly saturated for low loads. This was mostly caused by name mapping and by the overhead of using a dedicated process per client.
- Loads were unbalanced.
- Placing file location in stub directories hindered administration
- Authentication and cached files' status messages accounted for a disproportionate share of server interaction.

Contributions: It is unclear how many of the concepts were already present in AFS. They offer a great analysis scalability issues and propose the following new improvements to AFS:
- A better file cache consistency with well defined semantics, and with lower overhead.
- caching of entire files, with Venus used only for open/close.
- move the name resolution (computation) from server to client which produces a 96 bit Fid(volume,vnode,uniquifier) and passes it to the server.
- use of preallocated threads instead of processes in client-server interactions. Today is common knowledge that servers spawning processes to serve requests are a bad idea, were they the first to figure it out?
- use of volumes helps both naming and administration.

Performance techniques: (in addition to the ones already mentioned above): In general, they tried to exploit locality: cache of directories and full files. Servers use a flat name space. They use read only replication for load balancing. Location information is kept in a slowly changing DB. Use of scalable RPC. Use of batching: for callbacks, and whole file transfer.

Tradeoffs: They had to add new system calls to pass inode information to user level. Transferring and caching the whole file is a bet on certain access patters.

This paper is very well written, and they already do a great job of self critique. To what they admit, I would add:
Weaknesses: The use of a small and fixed number (5) of LWP on the server side, may be inappropiate today. It is unclear how the server solves the problem of maintaining excessive callback information (breaking as suggested is an incompletely explained option).
The composition of a "load unit" is unclear. Do 10 load units access exactly the same files, or 10 distinct files?
The results of the comparison of AFS with the prototype are undeniable, yet they used slightly different configurations.
As shown by their final results, a useful help for the coherency protocol could be some Hints (to cache or not ot cache) associated with files. This could be especially true in a world of large media files accessed once (which may not even be accessible in the described implementation).

Posted by: Daniel Luchaup | November 13, 2008 12:46 AM

Summary:
This paper describes ways to improve Andrew File System by redesigning a new version of Andrew File System focusing on scalability of the prototype. New version improves the scalability by a system design with various improvements to Andrew File System cache management, name resolution, communication and server process structure, and low level storage representation. The new Andrew File System helps to avoid significant server loads when the number of users increases.
The goal the paper was trying to deal with:
The goal of the paper is to improve a distributed, scalable file system, which is a large-scale campus wide file system (5000 nodes). The file system should be able to scale to serve a large number of users without too much degradation of performance, it can support simplified security model and can simplify system administration
Contributions
1． This paper lists a number of performance improvements added to the AFS prototype server. For examples: (1) added direct i-node interface to BSD (new API) to avoid have to use filenames on the servers; new index to map FIDs to i-nodes; some optimization made to Venus on the client side, which uses a local directory as the cache. (2) moved to threads from processes
2． Doing name-to-i-node translation on the client side also increases performance, although this requires changes to the server and client system call interface. This is quite similar to how directories are handled in NFS. What the server knows is only files with unique ids, but not directory structures.
3． The evaluation is done thoroughly, with a comprehensive benchmark suite. And the motivation of the new design is shown by presenting benchmark results and analysis of a previous version.
Flaws:
(1) one of the flaws in this paper is that the "scalability" measured throughout the evaluation part is actually raw performance. Scalability normally should refer to the capacity of a system with regard to the amount of resource available. (2) The second flaw of the Andrew File system is that it can not support disk-less clients very well and can not deal with large files very well because it needs to copy all of them.
The techniques used to achieve performance:
Cache management like cache directories and symbolic links in addition to the files also helps to improve the performance of system by caching as much as possible to avoid network traffic. The paper also implements the technique of name resolution to improve the performance; it reintroduces the notion of two-level names. Each Vice file or directory is identified by a unique fixed-length Fid. Each entry in a directory maps a component of a pathname to fid. Other techniques like light weight processes and low-level storage representation are also implemented in this paper. Because Venus uses light weight process mechanism, it can act concurrently on remote file access requests from multiple user processes on its workstation.
Tradeoffs:
Introducing callbacks into Venus greatly reduces cache-validation messages and thus increases overall performance. This exploits locality in file access patterns of most programs, and write-sharing of files being a rare case. On the downside callbacks increase complexity, and how the system will behave when network partition happens is not discussed. According to the available description, the system will probably lose consistency in the presence of partition, because the client not able to receive call-backs from the server will assume the file is up-to-date.
Another part of the OS where the technique could be applied:
The technique of cache management used in this paper can also used in network applications. A cache can provide temporary storage of resources that have been requested by an application. If an application requests the same resource more than once, the resource can be returned from the cache, avoiding the overhead of re-requesting it from the server. Caching can improve application performance by reducing the time required to get a requested resource. Caching can also decrease network traffic by reducing the number of trips to the server.

Posted by: Tao Wu | November 13, 2008 12:37 AM

Summary
The paper discusses how to extend the scalability of the Andrew File System while still maintaining performance for a distributed file system.

Description of Problem
As more workstations are being connected to a network, it becomes necessary to scale the distributed file system to handle requests from all of these nodes which could extend to 5000 - 10000 nodes. Current distributed file systems are not able to handle this level of scalability which may become necessary.

Contributions
- Volume structures that allow for replicating data across multiple andrew file servers. This helps divide file requests among multiple servers.
- Allowing nodes to cache accessed and modified files locally limiting the necessary communication to the servers to only opening new and closing files.
- Use of lightweight processes within one process for handling requests.
- Read-Only replication with limited cost due to copy-on-write behavior for backup and restoration.
- Callback mechanism for invalidating cached files
- Implementing Quota mechanism on the Andrew File System.

Flaws
- Inconsistencies in the Andrew File system causes the file system to be checked offline for long periods of time. For example writing a volume and accidentally terminating the call. In addition difficulty in maintaining good replication of the data across servers. Replication of volumes is a very manual process and difficult to do properly.
- The data collected in many of the tables have not been executed multiple times to ensure the behavior is consistent. This can be noted through how the paper explains anomalies as possibly maintenance activities, or suspected activity unrelated. Multiple executions would show this and strengthen their presented data.

Techniques
Techniques used to achieve the scalability are caching of all accessed files to gain file locality. This reduces the load on the file servers by allowing the workstation to access the files directly, eliminates unnecessary network traffic, and speeds up consecutive file accesses to almost that of a local file access. Another technique to improve scalability is redundancy across servers for balancing load. Spreading the volumes across multiple servers allows the workstations to access multiple servers to get the same file. The technique of caching for scalability can be used in sharing data between processes. If a number of processes poll a single process periodically to collect information, such as setting parameters that don't change often, each polling process could cache this information and use this instead of creating a IPC call. On the event of a change, the process that holds the real data can send out a notification of the setting that changed.

Posted by: Cory Casper | November 12, 2008 11:30 PM

Summary
The authors present a scalable distributed file system that was designed for handling the demands of around 5000 workstations in CMU. They present the various problems faced in running common user activities in a distributed file system by benchmarking an initial prototype and show how they solved these problems in their current system.

Problem
The authors felt that the present day distributed file systems cant scale to their requirements of 1000+ client workstations accessing the file servers. They feel the fundamental design decision of not caching the entire file in the client causes issues at such a large scale because of the state that needs to be maintained and the need to intercept each call to the file.

Contributions
- The major change in their design is caching the entire file instead of fetching the pages on demand. This avoids intercepting each call to read/write into the file as this is equivalent to working with a local file. Also this reduces the state that has to be maintained on the client machine.
- Their initial prototype helped them a lot in finding the actual bottlenecks in the file system. Based on their observations, they realized that there is a potential performance improvement by using notifications for cache invalidation rather than eager cache validation. (Even at the cost of maintaining extra state in the server)
- Using file identifiers to effectively cache the mapping from path to a volume saves many lookups each time. Also because of the locality of the file operations, most of the time at least part of the pathname is already cached.
- Volumes and mount points help in isolating the physical location of a file from the logical location as exposed by the file system.

Flaws
- Caching entire files though a good idea, doesn't help when the files are huge and unnecessary bandwidth might be wasted for transferring something that is never used.
- For a scalable distributed file system, minimum state must be maintained in the file server. But my registering callbacks for each and every file and directory opened by each and every client does limit this scalability. The server does remove callbacks arbitrarily instead of the clients doing it who may have more idea about the least used files.

Technique
- They take advantage of the temporal locality of file accesses and cache the data,metadata and location of the file knowing that it would be accessed frequently.
- Volumes and mount points introduce a level of indirection that allows easy movement of files in volumes.
- There is a trade-off between space(for maintaining the callback states) and performance(using callbacks instead of checking each time)
- Callbacks can be used in any distributed system that changes rarely or in other words with more reads and fewer writes.

Posted by: Tushar Khot | November 12, 2008 11:27 PM

Summary:
This paper analyzes AFS and motivates changes in areas of cache validation, server process structure, name translation, and low level storage representation. It presents an upgraded version of AFS system which scales optimally. It also presents comparison of performance with NFS and shows aggregation of files into volumes.

Problem:
The AFS did not scale well. The caching of files and stat-ing caused lot of network congestion. The use of dedicated process/client on each server caused critical resource limits and also resulted in excessive context switching overhead. The design decision also restricted movements of user directories between servers and required heavy namei operations at server. No way to enforce quota on users.

Contributions:
• Directory contents and links in addition to files were cached further reducing the amount of network traffic.
• The callback reduces the load on servers considerably by reducing cache validation traffic. Before a file/directory modification, server has to notify client having cached copy about the change.
• The introduction of Fid doesn't require venus to know the pathnames of the files and does not need extra namei operation at server, reducing lot of processing overhead. Files can easily be moved now.
• This prototype uses LWPs within a single process to service all clients, hence reducing lot of context switch and a thread is returned to the pool of threads after servicing a client request.

Flaws:
AFS might not be able to handle multiple large files effectively as it makes the full file copy and cache might run out. And it has very high latency of a first byte access of large files. As servers must now keep more state information, for large numbers of clients, performance will potentially degrade. Different workstations can access same file and write to it consecutively, this might corrupt the data when programmers unknowingly write it simultaneously.
Closing a file might take longer than writing a file. The time to write data can be overlapped with transferring it to server for lower latency.

Performance:
The scalability is achieved by shifting lot of work from server to client by namei being done at client location. Call-backs instead of polling reduce huge amount of performance overhead at server node and improves network traffic. Light weight threads instead of processes reduce context switching and resource demands.

Tradeoff:
The large amount of data stored at server is a tradeoff made to improve server performance. The server has to maintain callback state per user per file accessed.
File data consistency is traded off for performance. The file writes are localized and are delayed until file close and aren’t accessible by other workstations.

Another area:
Local files on workstations can be cached in totality than just few blocks/segments of it. The LWPs can be used on network web servers.

Posted by: Rachita Dhawan | November 12, 2008 11:26 PM

Introduction:
This paper focuses on the scalability and performance aspects of the Andrew File System(AFS), which is a distributed file system developed at CMU. The authors discuss various strategies for improving the performance and operability especially in the face of large number of clients.

What were they trying to solve:
To be practically useful, distributed file systems need to scale well. Existing distributed file systems, including the prototype version of their own file system had major issues in scaling up gracefully to thousands of users. Also, as systems grew, the management overhead of maintaining the systems also tended to grow. File organization was tightly coupled to the physical storage, files couldn't be moved across servers easily, there were no quota system, and file backup was a cumbersome task.

Contributions:
To improve performance, the following optimizations were done:
The unit of transfer is the entire file, not individual blocks. This minimizes the network overhead considerably.
A client side cache which stores both data as well as metadata(directories, symbolic links). This reduces unnecessary communication between server and client.
A push model for updates is implemented using callbacks. A client always assumes that the data in it's cache is valid. It is up to the server to actively inform all clients that a file has changed. The server maintains a callback for each pair.
Since the pathname to server file mapping is slow and happens very frequently, the pathname is mapped to a globally unique fid once and from then on the fid is used for communication. The fid has the volume information embedded in it, from which the server can be found using a volume location database.
Earlier processes used one server process per client, which is expensive. Instead of processes, threads can be used to avoid context switches.
To improve operability, the following changes were made:
For easier maintenance, files are grouped into Volumes. One volume can be attached to the leaf node of another using mount points, which are transparent to the client. Volumes can dynamically grow or shrink, and are limited to one partition.
Volumes can be moved from one server to another without affecting the clients, with just the volume location database that has to change.
Quotas are implemented on a per-user, per-volume basis.
Read-only files are replicated on multiple systems, and no callbacks are maintained for such files.
Backups are also done at the volume level using read-only clones.

Flaws:
The callback mechanism increases the complexity and state at the server. This may hamper scalability since a callback has to be maintained for every pair. Maintaining these callbacks in the face of network failures is itself an overhead.
The whole-file-transfer idea doesn't work if the files are large. It also doesn't work in a distributed database scenario where multiple clients need to simultaneously write different parts of the same file.
Volumes are still restricted to single partitions.

Techniques used to improve performance
Cache as much as possible to avoid network traffic.
Push-on-Write instead of Pull-on-Read.
Tradeoffs:
Simplicity vs performance: Stateless server is simple but slow, a stateful server performs better but is more complex to maintain.
Another part of the OS where the technique could be applied:
The push model can be applied in any situation where reads dominate updates, since reads have no communication overhead.

Posted by: priyananda | November 12, 2008 10:44 PM

Summary
John Howard et al. address the topic of “Scale and Performance in a Distributed File System,” by presenting an implementation of an improved distributed file system at Carnegie Mellon University which they call Andrew.

Problem
Though a variety of alternate distributed operating systems already existed, they were not designed with massive scale in mind. Andrew’s designers desired a system which would maintain good performance for a large amount of users, hypothesizing that the system would eventually be expected to effectively handle thousands of nodes. Thus, Howard et al. designed and implemented a system which they believed would be fundamentally more scalable than prior systems and then evaluated the performance of this system against Sun’s NFS, which they considered to be a representative distributed file system of the time.

Contributions
• Improving scalability by caching file data locally and only writing changes back when a file is closed. This caching improves performance by reducing cross-network calls and also improves scalability by reducing load on servers.
• Creating volumes, subtrees of the overall file system structure, which could be moved across partitions on various servers transparently, allowing for easy rebalancing (due to either load or disk usage). Since volumes also allow for quotas, there is a good mechanism for ensuring fairness of use of system resources. Also, the volumes are employed to provide efficient and reliable backups, something that is not always trivial on a distributed system.
• Employing callbacks to further improve scalability; instead of requiring clients to poll for changes, they would simply be notified when a change occurred. This minimized waste due to needless checks.

Flaws
• Caching entire files may not work well if the files do not fit in the cache. No description about what happens in this case is presented.

Performance
A variety of techniques which already existed in other areas of computing technology were used to improve performance. Most notable was the concept of caching data locally, using local resources in order to increase the speed at which that data could be accessed and simultaneously reducing the load on the server. Devising a way of moving volumes among servers allows for balancing load and by balancing the load across many independent servers (once a file was found to reside on a certain server, a connection was established to that server and further lookups were not needed), a large amount of users could be served.

Posted by: Mark Sieklucki | November 12, 2008 09:55 PM

Summary: The authors of this paper present the Andrew file system for large scale distributed file systems as an alternative to NFS that scales better. They utilize an approach which locally caches files from the main servers while they are in use and then copies changes to them when use of them is completed.
Problem to Solve: The main problem trying to be solve is the inability to efficiently scale large distributed file systems. Although NFS and others do scale, their performance typically degrades quickly as the number of users grow. Another problem is that volumes of data that these servers hold needs to be easily configurable and reorganizable as conditions change.
Contributions: The main contribution of the paper is that the authors are able to create a distributed file system which is scalable to a large number of users. They do this through the use of a new architecture which they prototype and then refine further to eliminate hindering limitations such as the original lack of flexibility in volumes. They also introduce the notion of callbacks, which allows local copies of data to be considered valid unless otherwise notified by the central servers. Callbacks reduce traffic to central servers because files which are often used but don't change much only have to be pulled from the server when changes occur. Another contribution they make is the introduction of Lightweight Processes(LWPs) within one process. It is LWPs which truly reduce the overhead of context switching to allow for multiple clients to be handled at once in an efficient manner on the server.
Flaws: One flaw of the paper is that security is glossed over. Although they mention they don't go into detail because the focus of this paper is on scalability, their argument for this system is less creditable without at least some details of the security system in place.
What tradeoff is made: One thing they do to improve performance is cache directories and their file listings to improve performance. This reduces the amount of calls to the server but at the cost of storing this information in local cache memory. Also to increase performance they keep copies of files locally until a user is done changing them before sending them to the server for synchronization. This increases the performance the user experiences and reduces load on the server at the cost of someone else being able to load and edit the old copy on the server. Although this happens in low probability in their system it is possible.

Posted by: Holly Esquivel | November 12, 2008 09:14 PM

Summary
The paper describes some of the design feature enabling scalability in the AFS filesystem developed at CMU. This involves a server side push model for cache coherence, complete caching of files, use of light weight processes on the server, etc.

Problem summary
Contemporary distributed filesystems did not scale well to a large number of users. Large scale has its impact on performance, administration and usability. The authors strive to build a scalable distributed filesystem with reasonable performance, which maintains interface compatibility with the 4.2 BSD filesystem semantics.

Contributions
- distribution of portions of a directory's namespace to other servers using stub directories enabled the distributed model for the filesystem.
- workstations cache the entire file to avoid contacting the server for each read/write.
- directories and symbolic links are cached on the workstations and path traversals to retrieve the fid is done by the workstation to reduce server load.
- callbacks registered by the workstation with the server allows for a server-push model for coherence of cached files. This way, the workstation can avoid polling the server each time a file is opened.
- A static pool of Lightweight processes is used on the server side to service client requests. This reduces overheads of process creation, deletion etc thereby improving performance.
- using volumes to seggregate files allows for movement of content between servers, enforcing quotas etc.
- persistent caches on the workstation's hard disk allows access-locality to be exploited across reloads of the wrokstation.

Flaws
- AFS caches the entire file on the workstation. However, I wonder how this performs with huge files. If only a few sections of the file are read, then the network processing overhead could be huge.
- using callbacks to enforce cache coherence has the disadvantage that some state is now being stored on the server. also since this needs to be stored on a per-file basis, this could be a scalability concern in itsefl.

Techniques
- Temporal locality in the working set is exploited when the workstation caches the entire file.
- the use of the callback mechanism to implement a server-push model for cache coherence avoids the client from polling for file updates. This reduces the load on the server.

Tradeoff
there's tradeoff of maintaining per (file, client) state information on the server in favor of performance.

Alternate aread
- instead of blindly scheduling in a round robin fashion, when we maintain some state on how much of it's previous quantum had the process utilized, we get a more fair allocation of the cpu in a pool of compute and I/O bound processes.
- callback mechanisms are used by most windowing systems today to push events into the application, thereby avoiding the application having to run a dedicated thread polling for events.

Posted by: Varghese Mathew | November 12, 2008 07:39 PM

Paper Discussion - Fall 2008 - CS 736

Scale and Performance in a Distributed File System

Comments

Post a comment