Disconnected Operation in the Coda File System
Disconnected Operation in the Coda File System, Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, October 13-16, 1991, pages 213-225.
Reviews due Tuesday, 3/27.
« Petal: Distributed Virtual Disks | Main | The Google File System »
Disconnected Operation in the Coda File System, Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, October 13-16, 1991, pages 213-225.
Reviews due Tuesday, 3/27.
Comments
Summary:
This paper presents Coda file system, which makes it possible for users to continue work (enhanced data availability) even when they are disconnected from servers (all or part of servers). This is done mainly through the use of write-back caching. Server files are pre-fetched onto the mobile computer (hoarding). Upon disconnection, changes to the file system are logged on the client (emulating). The client synchronizes its cache with the servers upon reconnection, propagates updates from the modify log (reintegrating).
Problem:
The problem here is how we could improve data availability even when one is disconnected (voluntary or involuntary) from servers. At the same time, we also want to keep transparency to users and take scalability into account.
Contributions:
+ The main contribution is to use caching of data, which is widely used to improve performance, to help enhance availability.
+ The paper makes nice decisions in design and comes up with good mechanisms to achieve the goals. Server replication and the support for disconnected operation are two key mechanisms to achieve high availability. It also demonstrates the optimistic replication can be used for practical mobile computing, which incorporates several ways to make it viable – log-based directory resolution, application-specific file resolution, and mechanisms for conflict detection (e.g. isolation-only transaction which is used to cope with the detection and handling of read-write conflicts during disconnected operation) and manual repair.
Flaws:
- It looks somewhat awkward to resort to external directives about future accesses with hoard lists from users.
- Since Coda uses weak consistency guarantees, it allows update conflicts and requires users to resolve manually. Usually the conflicts are highly application specific and users need to write a resolution solver for each kind of applications. And users may not be able to visit the data during the process of resolution.
- The reconciliation protocols allow server replicas to reconcile with each other, mobile replicas to reconcile with servers, but mobiles cannot reconcile amongst themselves. The reconciliation must involve the help of servers and go in a hierarchical way in which a replica only exchanges updates with its parent or children.
Applicability:
The idea of using the cache or the client copy is a simple but efficient way to greatly improve availability. As the mobile computing gets prevailing today, people with smartphones, ipads, laptops all wish to be able to get to work whenever they are at home, on the bus or at the office. They prefer to conveniently access the same copy of work without the need to carry the same laptop along with them all the time. In particular, in the era of cloud, it is more likely that the data is stored in the cloud storage and users fetch part of the data to their mobile device/equipment and do some work on it. The DropBox is good example to facilitate people’s experience. We could visit the local copy whenever we want, and it will then get synchronized with other replicas when we have Internet access.
Posted by: Yizheng Chen | March 27, 2012 08:37 AM
This paper deals with the design and implementation of disconnected operations. The environment is basically described as a set of file servers that store the data, the client acts on these files on its own machine and sends it back to the server. This paper assumes that all servers are secure and trusted. Also the operations are assumed to be performed on small files in a bursty manner.
The major crux of the paper deals with how the client and server communicate. There is a user-level process called Venus that does the synchronization of data in the client at regular intervals. It is implemented as a state machine which does hoarding at regular intervals of time, emulation of the file server at the client during disconnected operation and reintegration phase which updates the copy from the client after any server is again up. It allows you to mention priorities of files to be cached by allowing you to modify the hoard DB which it allows the client to modify. During emulation, all writes happen to the local copy of the data as if it is the file server and the writes are reconciled once any replica(AVSG) to which the client can propogate changes is contactable. This design takes care of availability by replicating the data on the servers and through the emulation phase at the client. The hoarding phase enables maintaining consistency of cache to atmost a few minutes and the reintegrates enables eventual consistency to be achieved.
The main purpose for which the system was designed is availability and it was very tactfully handled. Availability after crashes is handled by a RVM which persists data to disk. This is done periodically and data lost can atmost correspond to the time between the flushes. Reintegration phase takes care of conflict-resolution. It does the updates made by the client in a transaction. These update operations are read from a replay log that is maintained by Venus.
A major limitation with the design is that it is difficult to extend this design to a current system, you can say that AFS exists that can store large files but there are obvious performance implications to that. A distributed file system that stores the data blocks(inode blocks if implemented in kernel space) on different servers can provide for fast access of data and can definitely scale well with increasing file sizes. This would also mean conflict resolution can be done at a finer granularity in a block-by-block basis. In the current design, conflict resolution is not possible at a file level if there are write-write conflicts to a file. I can understand that this design will not impart 100% availability, but it allows operations on portions of files.
I feel overall that this is a good learning exercice for building a disconnected large scale system especially considering the time it was implemented in. The measurements in the paper were pretty interesting and the amount of work put in seems to be pretty huge.
Posted by: Srinivas Govindan | March 27, 2012 04:17 AM
This paper talks about how the aggressive caching mechanism used by a distributed file system (the Coda file system), originally to increase performance, can be exploited to guarantee availability during the periods of disconnectivity with the file servers.
The primary problem that the paper aims to solve is to enable and efficiently handle uninterrupted file system operations in portable computers, during periods of prolonged disconnectivity from the remote file servers and also to implement correct and efficient reconciliation once connectivity ensues.
The most important contribution of the paper is the extremely simple solution it provides to the problem of availability during disconnected operation by taking AFS’s caching + callback scheme to the extreme.
More details about the ideas in the paper:
• The paper talks about the Coda file system and the familiar AFS features such as whole file caching, volumes and callbacks.
• The paper carries forward the AFs design principles primarily for scalability and that happen to fit nicely into the extension of Coda for disconnected operations.
• The primary use case the paper tries to address is the scenario in which a laptop gets disconnected from the AFS servers and has to sustain file system operations through the data build up by the Venus cache.
• The paper treats the cached copies as second class replicas that are of low quality (recency and consistency) but that have high availability, as opposed to the server replicas which form the first class replicas and are highly consistent.
• The paper describes the Venus states. The hoarding state focusses more of smart and efficient caching and striking a balance between current performance and disconnected performance by performing smart eviction.
• A rather new idea is the incorporation of explicit user information in the form of the hoard priority in order to aid the decision of cache misses.
• The hoard walk seems to be a nice idea introduced by the paper that helps achieve the caching balance by departing from the AFS styled on demand refetch on callback, to do periodic refetctching during hoard walks.
• The paper talks about the things that happen when Venus is in Emulation – more specifically about logging, persistent and atomic storage and optimizations.
• The more interesting part of the paper is the replay algorithm – back fetching and the replay file that is written out if reconciliation fails.
The paper does not contain any major flaws. The approaches used by the paper are all very simple. However, one thing seems rather crude from an implementation perspective. The idea of having the hoard profile, though new and interesting, seems rather primitive and flaky. There have been many works on workload sensitive cache replacement strategies and some strategy that automatically learns the important files to be cached would have been interesting.
The idea described the paper is very much applicable in today’s systems. Although, with the exact mechanisms used by the system, the performance and scalability may be limited by the overall size of the system. The typical usage scenario would be that of AFS – the network inside an educational/research institution.
Posted by: Venkatesh Srinivasan | March 27, 2012 01:13 AM
Disconnected Operation in the Coda File System
James Kistler and M. Satyanarayanan present coda, a file system that is designed to provide extremely high availability for its clients via server replication and disconnected operation. Disconnected operations meaning surviving server failure, or as a possible side effect portable operation. It's major contribution is the realization that caching of data can improve availability.
Coda seems very similar to AFS. Oddly the callback cache coherence protocol described on page two is at a high level identical to AFS. Was coda a development project whose features got back ported to AFS? By caching a full file, does that mean that two people cannot collaboratively edit a file simultaneously ( ala Google docs ). Caching a whole file is possible and useful with small files, but at a certain file size wouldn't the performance suffer? Say for instance you have a two hour divx video on a coda server that you want to play. Are you really going to have to wait for the whole file to be cached before you can access it?
They imply that a partial file caching scheme would be possible with disconnected operation. How? If you open a text for editing and only had part of it in the cache would you not see the whole file? Is there any way to implement this and make disconnected operation transparent?
How is security implemented with coda? In the lampson paper they bring up the idea of a trusted computing base. Giving the user control over the hardware might not be the best for data security. Say a setuid program reads a password database. Would it then end up in the file cache and be accessible for any nefarious user that got their hands on the hardware.
Further, how are ACL's handled in coda? If you open a file ( and thus have it in your cache ), your client becomes disconnected, you change the permissions on the file, someone else changes permissions on the server ( so you cannot read, write or execute the file ), and then on reconnect would the replay file be created with your original permissions? Seems like this might be a vulnerability.
A figure of a disk of 50-60MB would be adequate for operating disconnected for a 'typical' work day was interesting. What was their user population? If they are counting on their data from the cs.cmu.edu afs cell then that's a very limited work set they are describing with potential for large spikes ( project due dates, end of the semester etc ), and large dips ( summer vacation ).
A final question is that if in phase one of the replay algorithm all files referenced in a replay log are locked, what kind of mechanisms do they have for breaking locks in case of failure during the replay algorithm?
Posted by: Matt Newcomb | March 27, 2012 12:27 AM
Victor Bittorf, Igor Canadi, David Capel, Seth Pollen
The paper talks about Coda, distributed file system, which goal was to improve AFS. Authors introduce a model of disconnected operation in distributed file system, which would enable clients to read and modify the data even when not connected to any of the servers providing the data.
Coda is file system designed for academic and research environment and its file access patterns. Given those patterns, authors try to answer the question: “Would it be feasible and useful to add feature of disconnected operation to current distributed file system?” The technology trend that made them ask the questions was the development of lightweight portable workstations, which were disconnected and carried home or on a trip. Without disconnected operation, distributed file system would become unusable on a portable computer. As an added benefit, the implemented feature also helps in an event of network failures. However, we had a short discussion that designing separately for intentional and non-intentional case would probably yield better usability, but would also complicate implementation.
We liked some of the Design choices of the Coda system, in particular a choice to move as much complexity as possible to the client. Clients doing most of the work takes the load from the servers and enables them to serve more clients, improving scalability a lot. We also talked about callback-based cache, which seems like to be much preferred choice over a synchronous cache validity checking on every access. This is the right thing to do given a fact that files are read from one location much more often than concurrently read and written from two or more locations. Although callback cache adds some state to the server, it also reduces the number of requests server has to handle (by not dealing with requests on every file access), improving scalability. Similar method was also used in Chubby’s protocol.
The main contribution of this paper was the idea of Distributed operation. In the past, none of the distributed file systems would be available if the client couldn’t reach the server either due to network or server failure. That idea became more important with time and is very interesting today, when more and more workstations are mobile and as a consequence they sometimes have bad connectivity to the network. One example of today’s successful models of Disconnected operation is Dropbox, which lets users operate locally and synchronizes the updates when it detects connection to the Internet. One more technical contribution is thinking about cache as a method not only used for increasing performance, but also availabilty.
In our reading group, we had a long discussion about difference of Coda and modern versions of disconnected operation, such as Dropbox. Although Dropbox model seems like a subset of Coda (where you assign infinite hoarding scores to files you want to keep), we concluded that Dropbox’s model is superior because it’s simpler and easier to understand to user. If you put a file into a Dropbox folder, you know it is there and you don’t have to think about it. Coda, however, does not make any guarantees on keeping the file in the cache and users might get unpleasantly surprised when they find out the file they need is not there.
This leads us to what we identified as a biggest flaw to this paper, which is the notion of predicting file usage patterns. For example, if a user watches few episodes of a show, Coda will cache last accessed episodes. However, when user gets home, he will want to watch the next episode, which is not cached. One more problem we discussed was the implicit runtime dependencies of programs Coda knows nothing about. For example, to free up some space, Coda might evict big texture files from a Photoshop directory, but not the actual binary. Photoshop would be unusable without the texture files, but the binary would still be there, using up space. Big problems are also mysterious error conditions that might arise. A developer might detect a bug when running a program, only to find out, after long debugging, the problem was that some shared library was evicted from cache. Coda removes basic assumptions users have about a file system. Even a single negative experience with unusable system would frustrate a user, who would not want to use the system anymore.
We also talked about usability of Coda for system files. Because of completely different usage patterns, we think using Coda’s disconnected operation and hoard profiles for managing system files would not be ideal. The files are read often and written rarely by only a few administrative users. They also always need to be stored locally - otherwise, the system wouldn’t boot. The approaches used today for managing system files are CFEngine and Active Directory. Because they are designed for special use case, they are simpler to use and implement.
Posted by: Igor Canadi | March 27, 2012 12:13 AM
The paper talks about a distributed file system where caching is done not only
for performance but for increasing the availability. The system is designed to
tolerate clients and servers getting disconnected involuntarily over time and
availability is guaranteed through cached data. The system is built with
scalability in mind and is modeled for scenarios where the amount of
concurrent access is very minimum. It is rather built for small work stations
which cant hold all the data in them.
The service is provided as an application level program which intercepts file
system calls. This client side is called Venus and it manages the cache at the
client side which is also replicated at the server at a volume level. The
client side cache is called the secondary replica and the server side cache is
called the primary replica. They implement a write-back caching, where data is
written back only when needed. Caching is done primarily to make the file
system more available in case of failures. Also, they have adopted whole file
caching so that availablility is affected only on the open system call and not
on the read/write/seek/close system calls. To support disconnected operation,
they choose optimistic replication, where clients dont obtain locks and
continue operating on a file even when disconnected. On reconnecting, the
conflict is resolved if detected. To detect conflicts, the clients maintain a
replay log and this is replayed at the server to resolve conflicts. The
clients can also obtain file-id in advance from the server so that the client
can create new files without contacting the server. Since data is replicated
at volume level across multiple servers, the client is considered disconnected
only when it is separated from all the replicas.
The important contribution of the paper in my opinion is the caching strategy
to support disconnected operation. They use a combination of explicit and
implicit information on the files being used by a client to determine on cache
eviction. The files are ranked based on their past usage and also through some
hints from the user on whether the file is important or not. This is done by
maintaining a profile. All the files are ranked periodically and when it is
time to evict a file, the client chooses the least ranked file and evicts it.
Also, the directory structure is cached by giving it infinite priority. This
concept of predicting user behaviour and populating the cache is very
interesting and should be applicable even today. For instance, I have a shared
folder and I am watching some movie from it, instead of caching just the file,
one can fetch the next file I am likely to watch. Or for instance, parsing
through the make file and determining which all files need to be cached once I
decide to build the software after modifying the current file. Another
interesting aspect of the paper is that most of the complexity is off loaded
to the client side rather than the server. Not sure if there are any other
distributed systems which have such property.
The paper does not clearly explain the callback mechanism which they talk
about. Is this somewhat similar to the chubby caches where the server
propogates changes instead of the client constantly polling? Also, the cache
management using hoard profiles requires the user to know the access pattern
for any software that he is using. This is not completely true in today's
scenario where most users do not know which all files are being used by a
software they use. In such cases, one will have to completely rely on the
usage history to determine what has to be cached.
Also, its not clear what the authors mean when they say that read/write
conflicts are not important since it has no notion of atomicity beyond the
boundary of single system call
Posted by: Sathya Gunasekar | March 26, 2012 08:08 PM
In this paper, the author described the disconnected operation in the Coda File System.
The main purpose of the system in this paper is to provide availability to the clients that are disconnected from the servers. At that time portable workstations started to become common, so they implemented the system to support disconnected operation for these clients, so that they can continue to work on some job without communicating with the severs.
The main idea of the system is to cache the server data at client side, so that they can be used even if the client is disconnected. There are several challenges that needs to be resolved for the cache system to work. First, it has to decide what should be cached at client. In the system, it uses prioritized cache management. It assigns priority to the objects based on profile data provided by user and the reference history. The cache replacement is also based on the priority of the object’s priority. Hierarchical cache structure are supported in the system. This is for the purpose of resolving pathname of a cached object. The ancestors of a cached object should also be cached.
When disconnected the system has to record enough log data for the purpose of reintegration when reconnected. The challenge is that the client may not have enough space to store the newly added objects and the log data. In the system, it has some optimizations to reduce the amount of data, but it doesn’t solve the underlying problem.
Concurrent updates to servers and disconnected clients are allowed in the system for high availability. The update conflicts will be resolved when reintegration occurs. Storeid of the objects are compared to detect conflict updates. How to resolve a conflict depends on the objects. The author showed the objects accessing history of 400 AFS nodes, and arguing that the conflict has very low frequency( only 0.75% ) during a reasonable time ( 1 day ).
I think the system might not be a good choice at present time. First there is no particular situation that needs such as special system. We can find other better alternative method with the highly available internet and cheap commodity machines. Second, it only works for some specific type of computing work. If data object is frequently updated and read it can not provide consistent result. Also the conflict can not be resolved in a universal way. And manual work is needed to specify what should be cached. And the limitation of storage of client can cause problems.
Even though the system may not be a good choice for nowadays, some of the ideas about cache management can be used for distributed system to resolve network partition. It’s another way of increasing availability with cache.
Posted by: Xiaoming Shi | March 24, 2012 12:51 PM