CS 739 Reviews - Spring 2012: The Chubby Lock Service for Loosely-Coupled Distributed Systems

« Paxos Made Live – An Engineering Perspective | Main | Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds »

The Chubby Lock Service for Loosely-Coupled Distributed Systems

The Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike Burrows. OSDI 2006.

Reviews due Thursday, March 1

Posted by Michael Swift on February 29, 2012 04:16 PM | Permalink

Comments

Chubby was designed to be a lock service, a distributed system that clients could connect to and share access to small files. The servers providing the system are partitioned into a variety of cells and access for a particular file is managed through one elected master node in one cell. This master makes all decisions and informs the rest of the cell nodes of that decision. If the master fails, the other nodes elect a new master. The problem of asynchronous consensus is solved through the use of timeouts as a failure detector. To avoid the scaling problem of a single bottleneck, the number of cells can be increased with the cost of making some cells smaller.

This paper is not focused on the technical specs of their system but on the design decisions and usage instead. The design of the client library tried to fit the prior expectations of the developers so special training is not necessary. Communication from the library to the servers is designed to be long term so the lock system isn't fine grained. Once again, they treat timeouts as a failure detector by having the client publish a desire to keep the connection alive and the server respond in a lazy fashion when the connection is about to fail.

The major meat of the article comes in section 4 when the authors discuss the conclusions they drew after observing usage patterns. From collected statistics, they noted that the majority of communication between server and client comes in the form of keep-alive messages. This is not surprising as other messages were designed to operate over long time periods and the keep-alive as a quick heartbeat. The small file size allows data to be served from RAM which serves to decrease latency and the use of caching, although adding a layer of complexity, prevents the servers from being swamped. However, the improvement of latency is a side-effect and not a priority for the developers. They reiterate their distrust of client developers use of the Chubby library and discuss problems misuse of their library create for both Chubby servers and client code.

It seems the designers were able to get away with a lot of choices that seem strange compared to other distributed systems we've seen by reducing the scope of their solution. By restricting possible file size and assuming access is coarse-grained, they reduce the effort required by the back-end to serve data from cache and propagate updates to the backups. By assuming access is coarse grained and explicitly dropping high throughput and low latency as major concerns, they are able to use timeouts as a failure detector. Lowered expectations allowed them to write simpler protocols.

One operation I still don't understand is Poison. It allows one thread on a client to unilaterally close the connection for all other threads. Why is that useful?

Posted by: Brian Nixon | March 1, 2012 08:29 AM

Summary:
This paper talks about a centralized lock service named “Chubby”, which provides locking and small reliable storage in loosely-coupled distributed environments. The primary goal here is mainly availability and reliability rather than performance, which is achieved by mixing several existing ideas together: Paxos for distributed consensus, consistent client caching, interface similar to the distributed file systems, and so on.

Problem:
There is strong need to offer coarse-grained synchronization to applications (particularly election of the leader) and to provide highly available as well as reliable small volume data storage for important data, e.g. meta-data or access control lists. Before Chubby was used, it uses either ad hoc ways or human interventions for this goal. But they may waste computing effort or a long down time in face of failures.

Contributions:
+ Proving a general way to provide synchronization service to different applications. This is achieved by implementing Paxos in Chubby, which is independent to applications rather than do Paxos in individual application themselves.

+ To present this synchronization service in the form of lock service rather than a Paxos library because lock service has several benefits over library method, e.g. help maintain program structures as the service matures, reduce the number of client servers needed to make progress.

+ Valuable empirical experiences other than theoretical analysis to help future designs in such implementations, e.g. how to choose transport layer protocol even the programming language.

Flaws:
This is an engineering flavor paper rather than a research work. So it seems not easy to say some flaws about it. I just put a list here as possible “flaws”:

- It can only do whole file reads or writes and it is not suitable for fine-grained locking. Fine-grained locking is pushed to application-level.

- Only advisory locking and not mandatory.

- Efficient caching requires a high ratio of reads to writes.

Applicability:
Chubby has got great applicability in Google, and it is also leveraged by other Google services such as BigTable and GFS. For example, The Chubby Service is used by Bigtable to keep track of tablet servers, master selection, bootstrap location, to store Bigtable schema information and to store access control lists. It could also serve as name service. It is interesting Yahoo again built the open source version of Google’s infrastructure – Zookeeper, which certainly shares a lot of common features with Chubby such as file system API model. They are both much more than a distributed lock service, and much like a highly available, distributed metadata file system.

Posted by: Yizheng Chen | March 1, 2012 08:00 AM

This paper describes the design of Chubby, a distributed system developed by
Google, described primarily as a lock service, but which also provides the
functionality of a basic small-file-oriented network filesystem with a shared,
global namespace. The paper also details Google's experience with actual
usage of the system, concerning such topics as system usability by developers,
weaknesses, and design features the proved unnecessary.

A (usually local) Chubby cell consists of a group of five servers, of which at
most one is designated as the active master server at any given point in time.
The other four simply replicate the database maintained by the master,
providing replication and a group of candidates (and voters) for electing a
new master in a fail-over situation. Three of the five nodes must be
operational for the service to be available, though I did not see any
explanation in the paper of the exact reasons behind this particular
threshold. Chubby is designed around a few basic assumptions, including: that
files stored in it will be small (on the order of a few KB or less) and that
usage of the provided lock service will be coarse-grained. The latter in
particular allows Chubby to aim primarily for availability and consistency
instead of the low latency desirable for fine-grained locking.

One of Chubby's key reliability features is the grace period on client leases,
which allows a fail-over of the master to a replica to be handled in a
minimally-disruptive manner, while its event notification system allows
consistent client-side caching, significantly reducing the load on the
servers. With its workload typically being heavily read-biased, and with
reads often being satisfiable via the client's local cache, it turns out that
overwhelming majority of the servers' workload is in merely servicing the
periodic KeepAlive requests each client must send to prolong their leases (and
receive event notifications). While not implemented in production at the time
this paper was written, the author describes how interposed proxy servers
could be applied to alleviate this burden, at the cost of increased
potential for unavailability.

The storage model Chubby presents to its clients is basically similar to a
simplified Unix-style hierarchical filesystem, with ACL-based permissions for
files only. One appealing aspect of its design is the implementation of the
features Chubby provides into the single, unified namespace of its filesystem
-- in addition to files themselves, locks and file ACLs are also done by
operations on files at designated paths, reminiscent (as the author points
out) of the "virtual filesystems" available in many Unix-like systems, which
present a file-access API to operations and objects that are not files in the
traditional sense.

A significant portion of the paper focuses on aspects of the design that were,
in retrospect, mistakes. These include features of the system (such as
provisions for lock caching that went entirely unused) as well as failures to
anticipate misuse, abuse, and general misunderstanding of the system by its
users -- a reminder that human factors must always be taken into account.

Posted by: Zev Weiss | March 1, 2012 06:56 AM

This paper decribe a lock service named Chubby, which is intended for loosely-coupled distributed system. The clients of Chubby can use it to synchronize their activities, but it is only for coarse-grained synchronization. It use Paxos protocol to elect the primary.
Due to the consideration of overhead, programming habit and scalability, a lock service sometimes is better than a library. The application need to use the lock system that Chubby provides to achive their functions.
A Chubby cell consistes of several servers, and each of them has the same data(replica). The replicas elect a master via Paxos protocol. A client communicate with one of the replicas by RPC, and get to know the master. Then all the operation for clients will be done via master. Writes will be propogated to all replicas and acknowledged when it reaches majority replicas; reads are satisfied by the master. All servers will elect master periodically to protect from master failure. Failed replicas will be removed from other replicas' list.
The Chubby is accessed by application by file system; a lock works as a node in file system. There are two modes, exclusive mode and shared mode. When a client issues a request, the server will verify the lock request by sequence number, and then accept or reject. The successful lock lasts only for one minutes in case that the client crashes. The client can also be notified the file changes by event and handle. The API to client allow it to open, read and write the file. There are cache in client to reduce read traffic. The client maintains a session for communication to servers, which is used to deal with server failure, master expiration, and client failure.
Discussion:(1) This system is design for coarse-grained system, because in write mode, the majority servers need to be updated and acknowledge, which costs a bit more time.
(2) All locks are accessed by file system, so there should be security about whether a process can access a lock.

Posted by: Wenfei Wu | March 1, 2012 06:00 AM

This paper presents the Chubby lock service from Google, which provides coarse-grained locks in a distributed system. It is designed to provide good availability and reliability using multiple nodes with a master node inside of each Chubby cell.

The goal of Chubby is to provide a reliable synchronization method that is designed primarily for long-term tasks such as electing a leader or other “distributed consensus problems” that do not require a large amount of data or all-too frequent changes. Another goal is to make it easy to use from a client perspective by providing a library and making certain guarantees about things like timeouts and caching consistency to the clients.

Chubby uses a lock server to provide consensus and allows small amounts of data to provide information about what has happened. The master node in a Chubby cell is the only one communicated with, the others only provide replication and will respond with the address of the master node if contacted. The locks in Chubby are organized into a hierarchy similar to a file system, with the beginning of the path indicating the Chubby cell to be used and the remainder of it representing a directory structure but permissions only exist on files, not directories. To provide better support under the failure of clients holding a lock, Chubby provides sequence numbers with calls using a lock and if a lock is released due to a failure, Chubby provides a lock delay that prevents anyone using the lock for a certain duration. The Chubby client library provides ways for clients to subscribe to events such as file modifications or failures while the clients communicate using a session and “KeepAlive” messages which indicate that a session should remain active or will fail if a message is not received in time. The paper mentions that a large portion of communication in the service involves these KeepAlive messages themselves. When a master node in Chubby fails, a new node in Chubby is chosen and if the recovery happens quickly enough, with a grace period provided, it can recover without reporting a failure and merely delays client activity.

The paper mentions how they are able to scale to a large number of nodes, but also seem to indicate that this was accomplished in a large part by the clients scaling down their usage of Chubby. It seems that normally to expect a client to cache more data and communicate less often would not be something that would happen automatically or be something demonstrating clearly a scalable design. The section on abusive clients talks about a necessary manual review process for client activity that is related to the limitations on resources and mentions that they had to decide on estimates which didn’t end up being that accurate or working that well but there isn’t much about solutions to this problem.

The Chubby service seems primarily useful for allowing synchronization with small amounts of data. It is good for handling large amounts of connections, but not large amounts of data or frequent writes. General consensus problems using the service to arbitrate decisions, the original goal, could be a major use.
The paper mentions how its largest use was as a name server with DNS.

Posted by: Daniel Crowell | March 1, 2012 01:56 AM

Mike Burrows describes the design, implementation, and lessons learned while using the Chubby Lock Service at google. The paper is a fascinating look into the world of google as Chubby is used in GFS, bigtable, and most likely the majority of their distributed applications and also as it mentions some of their cluster management practices such as having a 'free pool' of backup machines and running the chubby lock replicas on machines similar to all clients.

They evidently spend a lot of time making sure that chubby applications are not abusive. Education, and reviews to the point of requiring 'compensating parameters' to reduce resource use. Even talks of changing the behavior of 'open' so there was an exponential backoff delay on frequent calls to open. It seems like an awful lot of effort to spend protecting services from one another. Complaints about developers amusing. People never use a system the way out intended. That'd be a take-away point from this paper. Design an API, and it seems that you have to spend a fair amount of time educating developers on how to use it. Is that a lack of documentation?

Proxies play a large part in chubby. While they don't seem to support it one has to wonder if proxies are also replicated? Some form of consistant hashing? There was much snickering while reading the section on having to write a special proxy to support Java chubby clients. It's too hard to connect java to a C library so just rewrite it. That seems to be a case for not using java. Why is this language still so popular? Is the DNS hack outmoded by chubby? It seems like it might be partially as DNS does not do explicit invalidation. It seems like DNS could manage mass lookups, in a fashion with a zone transfer. Batching names would be equivalent to putting them on the same level domain ( ie in the case of the mentioned 100 names X1..X100.wisc.edu )

Why do they call it the chubby 'lock' service when it seems to do more file I/O and name service than locking? Their usage plot on page 11 states that there are 1K locks held and 12K open files!

Keepalive messages seem to the source of a lot of extra traffic. The table on page eleven shows 22K active clients ( with an additional 33K proxied ) and a keep alive rate of 1-2K a second. Later on page fifteen it states that keepalive messages are exchanged every seven seconds. The numbers do not seem to agree. Assuming 2K keep alive messages per second and seven seconds between keepalive messages, that means that chubby is dealing with only 14K keepalive messages in that seven seconds which it is expected would match 22K.

If keepalive messages are sent via UDP, does that mean that other RPC messages travel via a TCP transport? Doesn't that mean ordering the RPC messages sent via both channels might be required? It did not seem to be convered in the paper. While most of their usage appears to be restricted to local machines, how are timeouts changed between local cells and their global chubby service?

Posted by: Matt Newcomb | March 1, 2012 01:31 AM

This paper introduces the design and implementation of Chubby, which is a lock service developed and used inside Google. The system is expected to solve the coarse-grained locking and reliable storage for loosely-coupled system. Instead of providing new algorithms or techniques, this paper discusses the design details and trade-offs in Chubby.

As mentioned before, the purpose of this system is to provide a lock service. The system should be able to help the clients to synchronize their activities and to agree on basic information about their environment. It also needs to provide reliability, availability and easy-to-understand semantics, while throughput and storage capacity are considered as secondary. The demand of this system can be described in a distributed consensus problem. Although there’s existing solution including Paxos protocol, when bring the algorithm into reality, lots of problems in details including how to implement the protocol, how to achieve the goals of the system, and how to build a friendly interface for clients remain to be solved.

The main contribution of this paper is the design and implementation of Chubby system, which also provides answers the above problems. The system implements a lock service instead of a library, and serve small-files to permit elected primaries to advertise themselves and their parameters. It provides coarse-grained locks instead of fine-grained ones due to the consideration of the load on the lock server and the requirements, while the fine-grained lock can be implemented inside the application itself.

The system includes two main parts. One is the Chubby cell, which typically includes 5 servers known as replicas; the other one is a library for client applications. The communication is based on RPC calls. The interface is designed as a file system interface in UNIX. Caching is provided in client-side to increase performance, while sessions and KeepAlives idea are used to maintain the life-time of an entry. The mechanism for fail-over, backups and mirroring are also discussed. The paper also provides the thoughts on scaling the system. Two main techniques provided are proxies and partitioning.

In the Use and behavior part, the paper provides the usage data on Chubby system. Most of the numbers look reasonable. However, it is worth noting that KeepAlive dominates 93% of RPCs, and naming-related files dominates most of stored files. One thing that it is not quite clear, and makes me curious is the number of shared locks. The number of the shared locks is 0. Although only few clients hold locks, I’m wondering if the shared locks function will be used in productivity environment. There’s also no discussion about it in this paper.

The system presented in this paper has been used in Google. And from the data provided in this paper, I believe that this system is applicable and useful in real environment. Although its main purpose is lock service, it might be much more useful in other functions including naming services and low-capacity but high-available storage.

Posted by: Xiaoyang Gao | March 1, 2012 12:04 AM

The paper describes the lock service Chubby which is developed by/for Google. It is mainly used as name service and config storage that requires small files. Throughout the paper the author touch on nearly every aspect of the system from design choices to lessons learned, from partitioning to scalability, from caching to failures. It is a brief summary of the engineering efforts behind developing a distributed lock service that is used inside Google, most importantly by MapReduce, GFS, and Bigtable.

The problem that the paper is dealing with is actually implementing a reliable and available distributed lock service. Initial motivation is to have an API that allows clients and servers to synchronize their activities (such as primary election). For example, all potential clients can request a lock on a file, and the service should provide a single lock; so that the one that gets the lock become a primary and advertise this by setting the content of the file.

I like such real-world system papers since they represent how theoretical studies on distributed file systems, caching, consensus, replication and so forth can be used in a real working system. On the other hand, it clearly shows that the some assumptions in the theory papers simply cannot be handled independently (as they claimed to be), and the entire system design can be affected. Most interesting of all, the paper shows that even behaviors of software engineers can affect the design (e.g., client paxos library vs. lock service).

I realize that there are many parameters in the system such as cell size, session intervals, handle timeouts, lease timeouts, grace periods and so forth. I suspect that there are many more. It is, of course, expected that such a large system has many parameters. On the other hand, I had the impression that these parameters are not set in a very sophisticated way. It might have been better to extend the discussion on parameters and argue how decide on those parameters. For example, what happens if I increase the handle timeout from 1 minute to 2 minutes?

I find the organization of the paper very poor. There is not a very clear system structure and connections. I agree that it does not have to follow academic paper guidelines since it is not a research paper; however, it can at least has a good structure like the Dynamo paper. Another problem is that the evaluation part is very brief. Again Dynamo has much better quantitative values to evaluate the system. I think this problem is also related to the too many parameters issue. If there was a good evaluator, then it would be easier to justify the parameter selections.

Apparently, there is nothing to argue about the real-world applicability of the system; since it is serving to many clients and most important applications of Google. My only doubt is that, if I were to compare Chubby with Dynamo in this regards, it seems that Chubby is more specific to the particular infrastructure.

Posted by: Halit Erdogan | March 1, 2012 12:01 AM

Victor Bittorf, David Capel, Seth Pollen, and Igor Canadi

The Chubby system is an interesting change of pace from the previous readings by Lamport. The shift from theory to practice is abrupt, but there is a similarity. Chubby basically explores how to place a quorum with lock server. This brings performance but a different failure model.

Analysing the contributions of this paper is difficult. Mainly because this light on a lot of details and it all practical advice and experience -- they even explicitly state that this is not research, it is practice. We’ll come back to the main contribution later. The flaws of this paper are also difficult to directly identify for these same reasons. But, we have several concrete points to discuss.

Firstly, and most importantly, we were confused by the master election protocol. When a master fails, we understand that there is a new election for the master, and we understand how to go about that. And we see the quoted figures about how master election can be done in a few seconds. But what isn’t clear at all is how, when a new master is select, it assumes the role. how it gets the state. The paper says this can take a minute, but doesn’t say how the master collects everything it needs and pick up where the old master left off. Please explain this protocol in class.

Also, the facebook engineers who are in our group, want to call out google for not open sourcing this package. Zookeeper, an open source alternative to Chubby (which is currently used at facebook) is a possible alternative. It was also asserted that if facebook wrote Chubby, then it would be better and open source. Of course this is just speculating, since if you consider TAO... Anyways, this just turns into a debate between those who bleed facebook-blue and the other engineers in the group about the different workloads and how google and facebook have chosen their solution. *cough* MySQL *cough*

We also have some thoughts about how the paper presents itself. There seems to be some passive aggressive undertone in this paper. For example, the way the author mentioned the Java and how it’s handled almost seems to be needless mentioned except as a side commentary. Similarly, the paper always references the condition of the google developers. One would think that google only employs high school drop outs as programmers.

They say the designed Chubby to use a lock-like interface because that is what their developers are used to. But then they say that the developers misuse the locks because they don’t have the same semantics the developers are used to (since they are distributed locks). This makes it sounds like the lock interface was a bad choice since it caused many problems. But, on the flip side, using a locking mechanism with a familiar interface means the system would be adopted by people (even if they are later troubled by the change in semantics).

The true usefulness of this paper is the insight it gives on how to build a system that people actually use (and the problems that come with that). It gives insight into how unexpected things come up, and how they can be dealt with. And how systems evolve over time. This lays out some important methodology and practice for real systems. They start with theory, like paxos, to design their system. But they continue to evolve and develop and adapt to new requirements. But they also show that predicting user behavior is very difficult and must be accounted for.

We are left with the following wisdom:

When designing a distributed system, theory isn’t enough. The design process is never really over, in a real system the requirements and specs change and the final product requires many iterations. This is, for example, why a simple distributed key-value store can take a lot longer than originally thought, because you simply don’t know problems are problems until you get there. And then you have to implement bounded and approximate solutions and hand-wave certain failure cases.

From here, pretty much any system devolves into heuristic selection of non-trivially comparable trade-offs which will predictably prove to be wrong as you attempt to scale beyond spec. So, you craft SLAs which include counts of 9s and contain vague foot notes and an associated bureaucracy to maintain it all. It begins as an informal bureaucracy since engineers typically are not fond of formalities (or documentation). But, informal bureaucracies never work in the longer term, since an informal bureaucracy of engineers tends to retain some amount of pragmatism and rationality. So, ultimately, the pragmatic bureaucracy either redesigns the system from scratch (and starts the cycle over) or becomes a fully hardened, inflexible, and persistent fixture with a mystique born from hegemony whose utility only marginally offsets its costs.

Posted by: Victor Bittorf | February 29, 2012 11:09 PM

The paper for this week described the chubby lock service employed by Google.
The Chubby service is a lock server (with other servers as replicas) meant to
hold a database of small files reliably and consistency while achieving high
availability and consistency while achieving high availability. The realm that
Chubby provides the best solution for is a small database needed by moderately
large amounts of clients (in the tens of thousands). This environment is large
enough to require considerable effort to provide availability, and also to
desire failure recovery that does not need immediate human intervention. The
service also needed to be simple enough for many developers to build
applications on top of it.

A main contribution of this paper is simply the recounting of the engineering
effort and lessons learned in implementing the Paxos algorithm to solve
asynchronous consensus while providing a useful, consistent service. What made
it different from other, more simple systems before it is the leader election
protocol within the Chubby cell, and the client's way of handling short Chubby
outages to work with this election upon failure by allowing a grace period on a
session that looks to be inactive. The active session that this technique
preserves is also a nice feature, since it tremendously increases performance by
allowing the client to cache data and get "invalidate" messages when the data is
updated by another client.

Overall the paper does a good job of describing a good system and its
implementation. The fact that "KeepAlive" messages dominated the traffic so
much raised my eyebrow a bit, but I suppose it is a necessary evil to achieve a
highly consistent local cache. It is good that this service can be provided
about equally as well by proxy servers, though this adds to the required servers
needed for a Chubby cell to be scalable. I thought that Google was perhaps a
little too paranoid on the setting up of the cells, with 5 machines all holding
the exact same database. I suppose 5 machines is not a lot considering how many
machines a chubby cell can serve and if you're Google.

The chubby protocol is rather applicable to real systems since it is deployed
quite heavily on real systems at Google. It is a further example that a
distributed system is a web of tradeoffs between necessary evils that no one
would desire in a computing system. Also, it shows that the scope of problems a
distributed system can solve is inherently limited, even if the interface is as
generic as possible under the given constraints. Chubby works well for very
small files and applications that utilize locks that are held for very long
periods of time. However, it had to constrain the file size it was able to
accept for storage. As long as it is used according to its purpose though, it
is an excellent system. The primary goals that emerge, reliability and
availability, show what needs to be exploited when designing a system for this
purpose.

Posted by: Evan Samanas | February 29, 2012 07:51 PM

This paper explains how Chubby works. Chubby is a distributed lock service designed by and used at Google. It provides a distributed filesystem that is optimized for small files and rare writes. Since it implements advisory file/directory locks, clients can use it as a lock service, but they can also use it as a name service and, according to the paper, the latter has become Chubby's primary use at Google.

A Chubby cell consists of five replica servers, one of which is elected as a master. A given cell has dominion over a subtree of the global Chubby namespace. Every filename in Chubby begins with /ls/. The master serves all client requests, most of which are KeepAlive messages for sessions. Clients typically open a session and use these messages to keep the session alive. Clients can register for various events including file modification events. They use a write-through cache where the server can invalidate cache items for individual clients as needed. If the master fails, Chubby uses a fail-over mechanism to elect a new master and transfer clients over to it as soon as possible. Chubby includes a mechanism to conservatively transfer sessions over to the new master and inform clients of the fail-over so that they can invalidate their own caches and inform the application that events may have been missed.

One flaw with this paper is that it does not firmly establish the scalability of Chubby. Although it seems clear that Chubby's scalability is sufficient for its current uses, the paper presents no empirical evidence that it could scale further. It presents a number of methods for scaling, some of which have been used in production (increasing lease durations, adding Chubby cells), and others of which have not (proxies, partitioning). Regarding the latter methods, it is not clear to me how proxies could handle KeepAlive and read requests without server involvement. Further, the idea of partitioning the namespace of a Chubby cell so that subtrees of the namespace would have different masters seems less than ideal as it would require application developers to manually partition and would provide no way for the Chubby cell to handle load balancing among the partitions. Given that the paper already details a number of ways that lack of education about Chubby has been a problem, it seems that requiring developers to intelligently structure a namespace to take advantage of partitioning would be difficult.

Another flaw is that the paper does not present very much quantitative evidence. A paper like this calls for lots of charts showing performance characteristics for different use scenarios. It also calls for hard numbers. Instead the only chart we get is a table giving a snapshot of a Chubby cell with an assurance that the numbers are "typical" for Google. The authors also, strangely, do not give us hard availability statistics, but they do tell us that they recorded 61 outages "over a period of a few weeks". Giving the author the benefit of the doubt, I assume that these items were held back for competitive reasons, but that does not make it a non-flaw.

Posted by: James Paton | February 29, 2012 05:11 PM

This paper talks about a pseudo-centralized coarse grained locking service called Chubby, for distributed clients, typically in a data center.

The problem addressed by the paper was providing a locking service for distributed clients. This was to enable the clients designate a master or for other such reasons. More specifically, the paper was aiming at solving the problem of distributed consensus, in a pseudo-centralized manner, without using core distributed protocols like Paxos. It is based on a different school of thought for reaching consensus in a distributed system – have a central server that gives away locks to nodes, which are then designated as masters.

The main contribution of the paper is, perhaps, the practical demonstration that a simplistic, pseudo-centralized solution to the consensus problem is indeed possible.

More detail on the contributions:

- One nice advantage of the Chubby school of thought is that it can be materialized in applications with minimal change. It is also very intuitive for programmers to implement in their applications.

- The KeepAlive mechanism used by Chubby is way of preventing polling from clients for lock acquisition. Further, it prevents the need for a secondary nameservice that should be used to advertise the current holders of locks.

- Another nice advantage of the algorithm is that it does not use a quorum to guarantee liveness. The protocol is very simple and it is oblivious to clients failing and any client requesting lock can progress safely and independently.

- Lock state is cached could be cached by the client and this is later invalidated by the server once the ownership of lock changes. A nice advantage of invalidation Vs. update is that, once a state is invalidated, further updates need not be sent to clients is the lock ownership further changes.

- Another key point about the service is that it is used primarily to provision for coarse-grained locks. This is apparent from the fact that the service uses a subscribe-publish model to deliver events. Fine grained locking would have broken this model.

- The paper uses the notion of a Chubby cell – instead of having a single locking server (an obvious single point of failure) it uses a small set of servers that elect, among themselves a master to act as the lock server.

Flaws:

- It is certainly not crystal clear in the paper why a directory hierarchy was adopted to represent locks. Treating them as files with associated meta-data and ACLs is understandable, but apart from the differentiation of cell level, it is not clear why a hierarchy is necessary to represent locks. Hierarchy seems reasonable when locking is done at different granularities, but in this case, locks could just as well be represented as a flat list.

- The event based model seems rather inappropriate for a locking service because the KeepAlive messages seem to greatly add RPC traffic to the system. Since the locks are coarse-grained, the effects may not seem very pronounced. However, if finer granularity is supported by the server, the system would break down just due to the traffic. This overall design seems rather flaky and over-fitted. Also, this bloats the server with a lot of state information required for invalidation. A service in which the bulk of state is with clients and clients requesting for lock (and track lock transfer) with efficient caching seems much more elegant than the design explored by this paper.

- The usage of empirical values for a large number of parameters (lock-delay, KeepAlives, grace period etc.) without concrete justification and numerous arbitrary decisions (I dare say that the whole design was also not very well thought out) turned me off.

- The paper, on the whole was poorly written and had very poor organization, where definitions seemed to follow complete descriptions (Example, in Section 2.3, without mentioning what nodes represented, the author goes into detail about the meta data associated with nodes).

In conclusion, though the direction wanted to explore by the paper was very nice, the paper, as such, was a rather unpleasant read! A much cleaner and simple design and better organization would have improved the paper significantly. I can see that the idea of pseudo-centralized locking would be very much applicable in today’s systems, I seriously doubt the utility of the design for more flexible forms of locking.

Posted by: Venkatesh Srinivasan | February 29, 2012 04:21 PM

CS 739 Reviews - Spring 2012

The Chubby Lock Service for Loosely-Coupled Distributed Systems

Comments

Post a comment