Experience with Grapevine: The growth of a distributed system
Experience with Grapevine: The growth of a distributed system. Michael D. Schroeder, Andrew Birrell, Roger M. Neednam. ACM Transactions on Computer Systems, Feb 1984.
Reviews due Thursday, 1/26.
Comments
What's this paper?
In this paper, the author discussed the experiences in designing a distributed replicated system, Grapevine, which provides the functionalities of message delivery, naming, authentication etc.The author presented their underlying design ideas from several vital distributed design aspects, such as scalability, transparency, load control and reliability. For each topic, the author presented the challenge they had faced up with and mentioned the possible improvement way.
The main topics I am interested in:
Grapevine server side provides two kinds of services, one is message service, which is to accept and deliver mail to individuals and lists, another is registration service, which provides naming, authentication functions to clients using a distributed and replicated registration database.
Scalability Challenge: This challenge is to increase the system capacity by adding fixed power servers not more powerful servers. (1) the registration database partitioning works well since there are more registries rather than larger registries; (2) The distribution lists don't scale up well since some lists number are a fixed fraction of the total community. The author proposed the idea of indirection to divide the distribution list by registry.
Transparency Challenge: in Grapevine, eventual consistency policy is implemented such that there would cause some big problem such as data inconsitency. The solution proposed by the author is to ensure when the server side do the updating, the client programs should connect to the the same sever.
Reliability Challenge: As the author mentioned, replication scheme is a very successful way to obtain high reliability. The problems are (1) As the system grows, the consumed resourses are heavier which would casue failure such as the disk deadlock would happen if the disk space is low. (2) Secondary inbox placement would casuse overhead.
Limitations in this paper:
I doubt how people look at the eventual consitency scheme since sometime this policy could lead to data inconsitency such as losing some data. It is not suitable in some cases where losing data is unacceptable.
Posted by: Jie Chen | January 26, 2012 10:50 AM
This Grapevine paper gives an interesting summary of challenges and implemented (or planned) solutions for a distributed authentication and message passing system. This has good contributions on how to design a general message passing system that is resistant to network and hardware failures. This demonstrates the power of having a homogeneous system to build on.
The main contribution was on propagating and duplicating messages and system state changes. Their approach of separating message passing and registration seems novel. These are very related but two separate ideas. They also explored the implication of slow or faulty network links and how to compensate. This an important practical consideration but does not seem to carry very much theoretic weight. Although they did not go into much detail, there was some interesting conversation on the locality of data, specifically message sharing between all users on a sever. The concept that a single server propagates a change seems to be a design limitation, as they note with the large email lists.
Overall, the biggest contribution seems to be the transparency for the end user. This offers direct benefits for the end user and offers an interesting layer of abstraction. Another major contribution is the scalability, specifically being able to add severs easily to scale the system.
A few minor contributions came out their experience; such as the fact that they did not anticipate that as the community grew, some lists grew as a fraction of the community size. This seems to touch on building a deeper understanding of scalability.
As far as difficulties, I found a few limitations.
Their approach of gaining stability by splitting a server across 2 physical machines (one message and one registration) is simply a bad approach to scaling. Yes, you gain a factor of 2; but this doesn’t scale any further, it’s one a time use.
It seems the assumption that their mailboxes were “buffers” instead of persist ant storage proved to be very wrong, but they defend it,
“”In this case we refused to alter the system to meet the new load. The problem has been controlled by administrative limits on the numbers of such users.“”
This is clearly not a long term solution. Clearly this is a fundamental problem that caught them by surprise.
Overall, their plan to have Grapevine be an authentication system for all of the internet seems far fetched. Perhaps it is just in retrospect today, but one central authenticating service for all of the internet is simply a bad idea -- but this is mainly because of security and privacy concerns that were not as prevalent when this system was designed. It was a noble approach for the time.
As far as applicability; clearly this is very applicablity. It was used successfully by many people. Most the problems tackled here were directly related to implementation and application; even down to how to recover from a hardware failure on a remote server using remote recovery tools loaded by specially trained techs.
Posted by: Victor Bittorf | January 26, 2012 09:48 AM
what is the key value of the paper?
This paper introduces the operational experience of Grapevine, a distributed, replicated system, and the observations are classified into 6 different topics which are relevant to the design of most distributed systems. More importantly, for each topic, its good points and deficiencies of design are discussed, and potential improvements to Grapevine and other distributed system are described.
what has been accomplished, and what is the contributes of this paper?
1. Effect of scale: pay careful attention to an important principle of distributed system design, that is the ability to increase system capacity over a large range by adding more servers of fixed capacity instead of using servers with higher capacity. Respecting this principle is significant since it serves as base of efficient scalability of distributed systems. In addition, this paper proposes that the number of levels in the naming hierarchy can be increased in order to serve a much larger clients community.
2. Configuration decisions: This paper proposes some good ideas on how to make configuration decision including the number of servers needed for the systems, the location of these servers as well as the load-balanced distribution of registry replicas of primary and secondary inbox. In the paper, a design flaw is pointed, that is registries are merely defined to correspond to significant geographical areas. And the paper also points out an improvement method, adding organizational registries for certain distribution lists.
3. Transparency of distribution and replication: this paper points several cases where users and registrars may feel that Grapevine may not perform as the same as a unitary system. The most common cause for such observations is the delays in propagating registration database changes, besides other factors may also affect the transparency of the distributed system. For example, the break of certain internet links may result in Grapevine’s failure to eliminate a duplicate message from an inbox. In addition, another wrong design decision is discussed, that is the false assumption that users would not want to know the detail information about its state of availability or accessibility. It turns out these information is extremely important for users. At the same time, the author of the paper points out that it is much harder to describe the state of a distributed replicated system.
4. Adjusting to the load: Grapevine is designed based on assumptions of certain expected load, thus the system may not work as predicted or fail if it is much overloaded. Certain methods are proposed to reduce or balance load in the system. One method to reduce load is when propagating database changes, only the change in the update (typical) message is include rather than the entire changed entry in a message. Another method to reduce login time due to large and distributed client authentication information (a kind of large load) is to cache the results of authentication and access control checks, and change nested groups into parallel flattened version.
5. Operation of a dispersed system: For some basic repairing like rebooting the machine, the replication of services in Grapevine usually allows them to be made by operators without disturbing users services. For certain complicated problems, a text log mechanism is used in this system in order to help the diagnosis of experts. Especially, the logs allow trace distributed operations.
6. Reliability: The primary technique for achieving high reliability is replication of function among several servers. In addition, this paper introduces an important point, that is the significance of having spare resource to using redundancy to achieve reliability, and certain techniques to guarantee spare resource are described.
Flaws or limitations in this paper.
Since Grapevine was design and implemented in 80’s last century, certain design decision were made based on the hardware and other resource condition at that time. Some important factors influencing its design are small disk capacity and low-rate network link. Due to such physical limitation, some design, especially some technique details, in the paper are sort of out of date.
Posted by: Junyan Chen | January 26, 2012 08:50 AM
This paper illustrates some of the lessons learned by the authors over the course of two years maintaining their distributed system, Grapevine, as its user base grew by a factor of three. That the system in question is a message delivery and authentication system is not as significant to the point of the paper as its position as an early example of a large scale distributed system. As the frontrunners, the authors' insight would be immediately helpful to the rest of the community. Some of these insights are listed below.
1) The potential size of the system was bounded by the amount of global state each server tracked. However, more immediate problems to growth stemmed from a failure to anticipate the growth of elements such as distribution lists.
2) Design choice to allow server network to grow locally turned out to be a good one. However, this naive greedy algorithm didn't help to choose backup servers. The most important tool to make this decision was network discovery. As the network grew and routes became multi-stage paths, knowledge of the reliability of links was harder to obtain and simultaneously more valuable.
3) Forcing organizations to be monolithic with respect to the system didn't interact well with a design structure based on geographical location. More flexibility was required.
4) A major theme in this paper is that the uses made of the system don't always match the expectations from the original draft. In particular, users aren't as patient with an opaque system as they hoped. When changes can take minutes to be propagated, the users need to be informed of the state of their command or else might unnecessarily repeat their action. Also, the system faced increased load from a large growth of users interacting with it from PCs and not downloading their messages to local copies.
5) Some database operations turned out to be more expensive than others. For changes to large lists, a system of updating based on diffs instead of complete replication had to be adopted.
6) Noting when servers were not responding and keeping extensive logs added the ease of maintenance for the system greatly. Allowing remote diagnosis and repair turned out to be a good decision with respect to scaling.
The design lessons learned are certainly things to keep in mind when designing systems. I would want to know how relevant they still are in the design of current systems, applications in particular, as the nature of the underlying network has changed and the scales involved have increased. Another important contribution that could have been made was to develop a methodology for evaluating systems and identifying the key areas for improvement. The approach of the authors was not much more sophisticated than to wait and see what breaks as more users are added. Anything that allows a pro-active approach would be incredibly useful. While a full appreciation of the paper is not possible without a better grasp of the historical context of its solutions, I liked it and thought it had much to offer.
Posted by: Brian Nixon | January 26, 2012 08:47 AM
This paper discusses the authors experience with the design and operational experience of Grapevine, a distributed system primarily used for message delivery and authentication. Grapevine is designed to be scalable, transparent and reliable. The authors discuss how the decisions they made, both good and bad, have affected the system under a high work load. They also suggested possible improvements to their design.
The objectives of Grapevine design reflect some of the fundamental goals of the distributed systems that are applicable even today. It was designed so that the capacity of the system can be increased “over a large range by adding more server of fixed power, rather than more powerful servers”. This and other design goals, such as transparency and reliability, can be observed in distributed systems operating today. The implementations to satisfy those goals are also still relevant (for example, having multiple duplicates for data reliability, and caching for transparency). This paper also hinted some ideas that have are widely used now. One of them is the division of the namespace of users into multiple level (they divided it into a registry name and a name within registry). They also pointed out the importance and demand of accessing messages right from terminal other than downloading to the client’s computer.
The limitations experienced by the designers of Grapevine are different from today’s limitations. Therefore some of their decisions does not carry over to today’s systems. In my opinions, they were mainly constrained by limited disk space, and slow and unreliable communication, both of which have improved a lot since then. Configurations decision such as having a message closet to the client as a primary server for the client may no longer not be the important factor. Grapevine offer multiple services including message delivery and access control. Although it is desirable to have different uses of the system, I think it is simpler and more reliable to focus on each service.
The authors provided useful experience and suggestions that are important in designing future distributed systems.
Posted by: Min Thu Aung | January 26, 2012 07:56 AM
Summary:
Grapevine is a distributed, replicated system developed by Xerox at 1980’s that is primarily used for message delivery services, but also includes other essential functionality such as a general naming service, authentication and access control. This paper mainly talks about the lessons learnt in the system’s operation and six general aspects which are relevant to the design of other distributed systems are discussed in detail.
Problem:
- The system has to be scalable but some configuration information must be stored on all servers and it’s proportional to the number of users.
- Operators must make configuration decisions such as server requirement and placement.
- We want to present an illusion to users that they are connected with a single, large and reliable computer.
- We need to carefully study the workload in the system in order to make proper design decisions.
- We need to provide handy facilities to monitor, control and repair the system.
- How to make the service still available even when some servers go down.
Contributions:
Grapevine had a strong influence on many later distributed systems. It also shares a lot in common with DNS etc. The principle such as separating the naming hierarchy from the storage devices hierarchy is also the same as in today’s Hadoop (name node and data nodes).
Flaws:
- Only weak consistency (eventual consistency) is employed in Grapevine. The server that receives the update needs to propagate it to other replicas. This propagation may take some time and users may encounter inconsistencies (still use old copy or the messages are re-ordered).
- The paper does not talk about throughput issues, i.e. how many messages can be delivered/processed in a second. Grapevine may be good enough in terms of throughput in their use, but throughput will definitely become a concern to see if the system could meet users’ demand when it scales up.
- It’s challenging for operators to do some kind of configuration work. For example, it’s crucial for Grapevine to partition the inbox sites on different servers, since a message may be shared between several inboxes and stored only once on a server. It’s also non-trivial to balance users among servers. These require a substantial administrative workload to administrators.
Further, when new machines or more disks are added to the system, the data on all server machines/disks need to be distributed by operator again in order to ensure a balanced performance. So it would be nice if these works can be done automatically rather than manually.
- Security. Grapevine assumes all the servers in the system are under control of a single organization or a trust among multiple organizations. This may not be true in some cases and problems may arise when several independent organizations make their own management decisions at the same time as we discussed in the class.
Applicability:
Almost every piece discussed in the papers is still worth considering today.
For example, the number of replicas and where to put them in a distributed file system such as GFS and HDFS (it’s interesting that considerations in the paper lead to having three replicas which is the case in GFS and HDFS).
Again, the management of popular social networks today such as facebook definitely faces the same set of concerns such as privacy issues (e.g. access control) and how to manage users and groups. The difference is we can easily get powerful hardware today but things may also become complex in this context (e.g. we want to do fine-grain access control for users’ content).
Posted by: Yizheng Chen | January 26, 2012 07:54 AM
This paper mainly describes the operational experience with Grapevine in several years. It focuses on the problems Grapevine met in these years and the potential improvements that can be deployed to solve those problems.
The paper tries to provide solutions to the problems related to a distributed system as the system growth. The growth mainly indicates the size of users, groups, and data, as well as the number of servers, the distribution of servers in geography, and the evolution of the connecting network used by the system. These factors leads to various problems on scalability, transparency of implementation, load-balancing, human-related problem, and reliability.
The main contribution of the paper can be divided into three areas. The first is the problems they proposed according to their experience during the growth of Grapevine in a real workload. The problems are valuable to other system designers to avoid the same problems happens in the new-designed system. Lots of the problems may hardly be found in the original design of the distributed system, including the behavior of the system when the network topology changed and the unbalanced load between different parts of the system. The second one is they categorized the experience into several general topics. The topics provide hints to what the system designers should consider when they designs new distributed systems, and even the hints to evaluate a distributed system. The topics they proposed is the effects of scale, configuration decisions, transparency of distribution and replication, adjusting to the load, operation of a dispersed system, and reliability. The third and final one is the discussion of the solution to specific problems. Although most of the improvements they discussed are not deployed into real system at that time, they can still be a reference to the following systems which meet the same problems.
The major flaw of this paper is the absence of the discussion about security of the system. The network is treated as a secure environment. It might because the network is relative safer than the network we face today. However, the security issues should still be considered. The trust boundary of the system is not clear without the consideration of security issues, and it makes the system vulnerable.
In sum, most of the contributions in this paper can still be used in real applications. Some of the problems and solutions are much more mature in today’s applications, including the distribution and replication, and the communication between servers through various topology of networks. However, issues like scalability, reliability, load-balancing, and even the generic design for various changes that may take places in the future, are still important in today’s system design.
Posted by: Xiaoyang Gao | January 26, 2012 07:45 AM
This paper presents a report on the real-world usage of Xerox's distributed
system (Grapevine), showing it as a mostly-successful effort, but also
offering candid descriptions of areas in which it fell short. Grapevine
provides various network services, including systems analogous to email and
LDAP, with the aim of doing so in a sufficiently reliable, available, and
scalable manner to support the day-to-day operations of thousands of Xerox
employees at locations scattered around the world.
The paper's primary purpose is to provide general insights into the design of
distributed systems. These include techniques for designing systems that span
slow or unreliable network links, and for keeping the load on an individual
component independent of the total number of components in the system
(i.e. maintaining scalability).
Though it was later corrected, one particular initial design decision struck
me as a surprisingly naive mistake: the first implementation of group updates
was done by sending absolute membership lists and merging at the receiver.
Given the nature of a merge operation, it seems that they must have had a
distinct operation to remove a group member, but given that such a removal
would effectively be a negative delta operation, it seems like an obvious (and
not particularly complex) design to use the same delta mechanism to implement
additive group updates as well. Though they eventually moved to sending
deltas, it was not clear to me why it wasn't implemented this way from the
beginning, as it is an obviously more scalable design.
Some of the perspectives displayed toward security and reliability seem quite
relaxed -- for instance, the up to 12 hour delay in access control updates
taking effect and the possibility that a message body might be replaced with
an apology from the operators (imagine if Gmail did that!). By the standards
of the day, however, I have to assume these were regarded as satisfactory.
At the scale at which Grapevine operates, their design scales well in terms of
computational resources; the only obvious obstacle to further scaling, as they
note, is the awareness of the specific other servers in the system that must
be configured into each server. While they cast this as a problem of storage
space, it seems to me that even at that point in time disk space was probably
growing quickly enough (which they seem to confirm in their conclusion) that
this was unlikely to be problematic going forward. What seems more likely to
be a bigger problem to me would simply be the management of this configuration
information, in terms of tedious and error-prone human effort, as the system
grows to include a larger and larger number of servers. Obviously the system
was never envisioned to run at the scales of thousands or tens of thousands of
nodes that some modern systems do, but even with a few dozen servers it seems
like this approach would be approaching the limits of its scalability. Some
sort of automated discovery/configuration protocol would ease this greatly
(though of course this is obviously easier said than done).
Posted by: Zev Weiss | January 26, 2012 07:17 AM
The paper discusses about the challenges faced by Grapevine, a message delivery, authentication, resource location and access control service when subjected to substantial load and the techniques that were newly implemented, in addition to the already existing ones to overcome them. In particular, it discusses about the solutions to a number of issues faced by the system with focus on supporting scalability without compromising on performance or reliability. In addition, it also provides solutions to problems relating to providing transparency and fault tolerance. The ideas discussed in the paper could be considered as significant contributions as these issues and solutions are not specific only to Grapevine but are applicable to most modern distributed systems.
The important contributions of the paper are:
1) Ability to scale by adding more servers of fixed power instead of using more powerful servers. This is made possible with the help of grouping of entries in the registration database into registries that are replicated as a single entity. This avoids scaling issues by requiring more registries instead of larger registries.
2) The idea of grouping users in a large distribution list into sub-lists similar to hierarchical clustering thereby reducing the response time of messages sent to general interest distribution lists.
3) Though not implemented, the idea of decoupling the registration server and message server can be considered an important decision in scaling because otherwise an overload in either one of the registration or message server would cause delays in the response time of both kinds of requests leading to the compulsory addition of another system and underutilization of the other server that is not overloaded.
4) Even though it supports only eventual consistency thereby reducing the response time of user actions, it does appear as a single, large, reliable system majority of the times except in rare cases where registry updates are delayed or when actions involving the updates occur before they are propagated to all the servers.
5) Message forwarding to a server closer to the recipient reducing delivery time.
6) The decision to have three replicas with primary servers present closest to client, secondary servers placed nearby such that no overloading occurs in case of primary failures and tertiary replicas present on the other end of a vulnerable network so as to increase availability.
In my opinion, the paper has one minor flaw. While discussing about scalability the paper makes the assumption that the system size would be a maximum of 30 with 10,000 users and says that the system scales to this limit. However, this assumption is flawed as most modern systems have much larger load and would require techniques apart from those discussed to support such heavy loads and hence cannot be considered a truly scalable system. The following are some of the major limitations. While the paper focuses on improving scalability without having an impact on performance, it fails to address issues related to consistency leading to a majority of the issues discussed, for instance, the inconsistency caused by delay in propagating registration updates, name deletion and the inconsistency due to long time out caching. Another limitation is the inability to store all mails without deleting them in sequential order. If this feature was introduced it would cause more scalability issues as the volume of data in each user’s inbox keeps increasing unlike the current system.
In conclusion, it can be said that the paper provides the foundation for most of the common problems faced in all distributed systems and provides some significant solutions to the same with the focus being on scaling with high performance and reliability at the cost of consistency.
Posted by: Madhu Ramanathan | January 26, 2012 06:34 AM
Grapevine is a study of an email messaging service. This problem was first of all studied as it is a pretty good subject for distrubuted systems research. The message servers redirect the request to the appropriate inbox sites in an efficient manner. Registration database is replicated for fault tolerance and is a major piece in the architecture which is the primary contact for getting info about users, groups, message, etc. The main feature of grapevine is the distributed routing supported by message servers and the simplicity of the design. Also, the message to a single message server are sent altogether instead of sending one by one. One of the major features that made grapevine distinct and more easy to use compared to earlier file servers was that only a single authentication was required to get access to the clusters.
The paper talks about some of the optimizations that were added to the initial implementation of grapevine to improve various aspects like load scaling, reducing performance and problems due to distribution and replication. Some of the major features talked about here are:
1. Separation of a registration service and message service rather than having them together as a single service.
2. Some of the things that are to be taken into account while deciding how to replicate registry database.
3. Registrars need not be aware of the things going on in the system and can continue their upgradation independently.
4. Authenticated access is quickened by caching the required parameters for authentication.
5. Provides easy upgradation of servers and remote monitoring for administrators.
6. It provides consistency through synchronization by taking the difference in inbox content from the last time the user retrieved to the current time of retrieval.
Minor flaw:
1. It doesnt completely eliminate duplicate email messages from getting delivered.
2. Not replicating the message body means the message can be prevented from reaching the user. I think this is a necessary feature to have.
On the whole, This paper gives a good overall picture of the aspects to be considered for building a scalable and well planned distributed system implementation.
Posted by: Srinivas Govindan | January 26, 2012 02:58 AM
This paper analyzes the Grapevine distributed system that was used for message delivery and in some later cases general authentication. Grapevine was in use for several years at Xerox where its performance and stability were monitored and the authors discuss some of what they learned from this and the design decisions behind the system
The goal of the Grapevine system was to provide a message system that would be flexible across a large network and able to scale up by adding new message servers to the network to increase system capacity. This same system could be used across the network over a large area and would automatically keep each server in synchronization, though not immediately. It supported a hierarchical structure where messages to groups would determine the target of the message and a user could be sent a message without knowing the specific computer it was to be sent.
Grapevine provided a scalable distributed system that tried to keep things in order without the continued intervention of an administrator to make sure everything remained updated. It used a method of keeping accounts across a network that could be used for authentication from different computers. In terms of stability, Grapevine remained operational under the failure of individual servers, which were designed to smoothly restart. These non-atomic updates occasionally caused problems, such as duplicated messages or the sending to a recently deleted user from a group, but in a message specific system this does not seem to have been a large problem in operation. It was also a system that had time to be used with a larger number of users and remain operational for several years. This showed that systems like these were practical for wider adoption.
The paper talks about the stability of the system and mentions that they have had only a single hard drive failure, which they seem to then take as a rare occurrence and then focus less on some of these more critical failures like this in the paper. Testing a system can give a good idea of stability but some items are more related to chance than proof of stability. The previous paper, "Experience with Grapevine: The Growth of a Distributed System" mentions that a major problem with stability came from bugs in the servers which would cause all of them to crash, often near the same time due to the same trigger. This paper does not appear to cover this issue much, apart from the specific case of chain overloading. Perhaps it became a smaller issue later on as more problems were fixed.
Many of the ideas in the Grapevine system can be applied to a wide variety of distributed systems. The general ideas behind the authentication and spread of tasks across different servers is not limited to an message system. The tolerance to failure of a single server is also important in many areas as is the ability to identify accounts as is the scalability of the system. The paper mentions how the authentication portion began being used separately from the message component at Xerox.
Posted by: Daniel Crowell | January 26, 2012 02:31 AM
In this paper, the authors describe the experience they have with the Grapevine---a distributed system that provides emailing, naming, authentication and so forth---during last few years (1981-84). They report on the problems and failures they observe while the number of users and email traffic increased approximately three times. They propose solutions to these problems. Since, the most important concern while designing distributed systems is to "design for failure" [1], and they experience with a real world application with real users, I believe this paper provides an invaluable guideline for other distributed system designers of that time.
Since the main contribution of the work is to report lessons learned from using Grapevine, I will review the paper towards that claim. They analyze the effectiveness of the design and resulting system in terms of 6 different aspects. Each problem they reported and each solution they proposed itself can be considered as a novel contribution to the distributed systems literature. I will touch on three of these aspects which I believe the most significant or contains several flaws.
The first aspect is the "effects of scale". One of the observed problems is that emailing to large distribution lists may take up to 10 minutes since their computation is done by a single message server. They proposed a solution with layered distribution lists which spread the computation to several servers. I believe this is a significant observation and solution since it help design emailing systems scaled properly to the large distribution lists.
The second aspect is the "configuration of decisions" where they discuss the issues related to configuring the servers and managing the replicas. This section is unclear in the sense that it does not detail the problems they face. For example, they propose solutions to deal with heavy local load but it is not clear that what the current heavily loaded servers cause. Similarly, the problems that are caused by not using organizational registries are missing.
The sixth aspect is the "reliability" of the system. Although the authors touch on several observed problems related to reliability, it seems that the section lacks of proposing solutions. One of the problems they observe is single server disk deadlock; however, it is not clear how they handle this issue by treating it as a symptom of an overloaded server. Another problem they observe is that the system is not capable of having one server running a specific application other than emailing. But the current design does not restrict messaging servers filling this specific server up when there are unavailable servers. Similarly, the authors do not detail a sufficient solution to this problem.
Grapevine is a running system with 4400 users, therefore the indicated problems and proposed solutions are is of paramount importance for real systems. Although, as mentioned above, sometimes it is not clear what the problem or solution is, I believe the paper is a very significant contribution since it analyzes a real distributed system with respect to the most important design principles (except security) such as scalability, reliability and physical distribution.
[1] Introduction to Distributed System Design, Google Code Review.
Posted by: Halit Erdogan | January 26, 2012 01:55 AM
The paper begins with a brief introduction to the design of Grapevine, a mail service centralized distributed system. It mainly discusses the problems Grapevine has under real workload, causes of these problems and possible solutions to the problems.
The goal of Grapevine is to provide scalable and high available service. The two main services it provides are sending emails and locating machines. It also provides authentication and access control so that secure service can be built on the top of it.
This paper contributes by mentionion many bright design ideas and summarying lessons learned under real workload. Using email as the communication tool is a great idea since the receiver does not have to be online when the sender wants to communicate. This asynchronized communication model simplifies many things, like updating data and propagating changes. Grapevine replicates not only data but also service. Users have multiple inboxes so that multiple delivery paths exist. When one of the inboxes is down, the users can still receive emails. The problem of distribution list tells us that hierarchy structure is generally more scalable than flat structure. Adding or deleting one will cause the whole structure to change in flat structure, while only parts of it in hierarchy structure.
One flaw in this paper is about handling data consistency. They adopted eventual consistency instead of strong consistency, which can cause many minutes of inconsistency in their system, which trades data consistency for availability. A new created individual may seem to not exist for another server. Though this provides service, it is not so useful. Exacerbating that, cache is used for authentication, which further slows down the propagation of changes. Suppose user1 login a file server with its password xyz, then the cache will have username and password pair ; user1’s password is compromised; user1 notices that in time and change the password; but since in compromised password is cached, xyz can still be used as the password to login the file server. Another flaw is about fault tolerant. Grapevine has to involve experts to recover a corrupted disk instead of automatically recovering. A related problem is that because message body is not replicated, there could be situations replacing broken message with an apology. This seems to be not acceptable for most users.
This paper influences design of future distributed systems. DNS uses hierarchy structure rather than flat structure. Replication is used in almost all current distributed systems for improving both availability and performance. Emails have been an significant part of daily life, but the idea of using emails as the system communication tool seems not popular.
Posted by: Xiaozhu Meng | January 26, 2012 01:55 AM
This paper summarizes the initial design of Grapevine, primarily a distributed messaging system, the challenges that where discovered after several years of operation and changes made to overcome those challenges.
The primary contribution of this system is not so much technical, but perhaps as an experiment to determine how users would use a distributed messaging system. The popularity of distribution lists, reading your email at home, and literally the quantity of mail potentially generated by a user was not foreseen. It was with particular amusement that the section mentioning that an eighty megabyte hard drive could eliminate the need to archive messages for a large grapevine community was read ( checking mail on my seven gigabyte gmail account ).
It appears that perhaps this system was designed in the kinder, gentler days of the internet. There are several security issues that literally jumped out of this paper. Caching access control lists for twelve hours means that administrators cannot instantly remove a users access to a system. While there was some conversation about authentication there was nothing about security authentication credentials. Where passwords sent in the clear in those days? They freely admit that multi-server deadlock occurred, but only when provoked by an exerciser in a test system. Logs where only kept around for a week meaning any delay in detecting and/or reporting problems as is want in a large distributed community of people means that the logs could have been overwritten. The re-mailing subsystem is rife for abuse. Accidentally re-mail a large inbox and in two days all of your servers are down? The paper does mention that groups have no structural constraints. It is assumed that there is no explicit check for cyclic group structure. Further on they mentioned that a user may receive duplicate messages if they are in a destination list more than once. If the distribution lists are processed in a serial fashion could a group structure be created such that an infinite stream of messages would be generated? Evidentally a single grapevine server will forward messages for another. Relaying a message for an unknown machine is, in general, no longer allowed on the internet of today because of SPAM.
Besides all of the security concerns, grapevine commits at least another two of the eight fallacies mentioned in the google paper.
They did assume there is only one administrator. Database replication was achieved by sending out the entire changed record in the database. No mention was made of what would happen if two administrators made a change simultaneously. Or if two s
ites where removed from an inbox site list in rapid succession. The potential for a mail loop seems to exist. In the same vein, if a users mail box was moved away from a server to another, and then back could the asycronizity of administrative work cause the re-mailing of any incoming messages to end up in a mail loop?
Finally they assumed that the topology of the organization they are serving does not change. If a large portion of a grapevine community was moved, the administrative tools do not appear to exist to manage that move.
Posted by: Matt Newcomb | January 26, 2012 01:46 AM
Grapevine is a distributed messaging and authentication/naming system that allowed Xerox to use a variant of email and centralized file access controls throughout their network in a reliable, scalable way. It also contained a groups feature that was applied to many different situations, including managing the system itself. This system worked extremely well at a small scale, but ran into non-obvious problem when scaled upwards. Additionally, they ran into their system usage following actual social structures instead of their imagined social structure, occasionally with mayhem ensuing when resources became depleted in unexpected ways (eg, the tax mailing list). However, showing that technology progresses quickly, all their resource problems could easily be solved by running the entire system on my iPhone.
Grapevine, at its most basic level, solves any problem that can be reduced to sending asynchronous messages and doling out authentication. In practice, it allows them to send email and use ACLs in a centralized fashion over unreliable links with extremely limited resources at geographically seperate locations. Additionally, it avoided a single point of failure in their system by using multiple inboxes and databases and failing over if a server was unreachable. This improved over existing systems that were not reliable nor distributed, and in the case of file controls, not centralized.
This paper contributes to the field of knowledge by documenting the problems of what was, at the time, one of the largest systems of that type and with the scaling problems caused by it. Especially useful was the description of how a failure of one system could cause rippling side effects in other systems, or, alternately, how an action that seems reasonable in isolation can cause problem (eg, deleting a user account resends all messages, which may congest a link and cause other failures). The authors also note that a complex system, if left on its own, quickly becomes complicated and fragile enough that even the original authors are afraid to modify it. It also explored hierarchical naming and how it reflects organizational or geographic structure.
The primary flaw in this paper (and system) is that the designers envisioned Grapevine as applicable to any problem that used messaging and naming. However, the system was only ever tested or used for email and authentication (and management of itself). This caused the system to be more general than it needed to be and perhaps make the wrong trade-offs. An example of this is nesting of groups -- attempting to take the closure at the origin instead of delaying the work does not scale, and while it may be useful in the general case, treating groups as either large, flat collections or as individual entities (a la mailing lists which are normal addresses) would work better for both their use cases.
Posted by: David Capel | January 26, 2012 01:03 AM
This paper talks about the impact of various decisions made in the design of Grapevine, a distributed and replicated POP-styled email service and how the design decisions stood the test of scale when the number of users of Grapevine increased (over time and over geographic space).
The original idea of Grapevine (described in the background paper) was convened to do research in distributed systems structure and to develop a better mailing system (not a concrete problem, as such). This paper solved a number of problems that arose due to scale in Grapevine. In fact, this paper could be considered as a classic manual to the problems that might arise in any large-scale system, and the thumb-rule solutions to those problems.
The contributions of the paper are many. A single phrase to summarize the contributions would be "Simple, straightforward solutions to scaling issues in distributed systems".
- Being the first system to be considered as a concrete "distributed" system underneath, even while giving clients the impression of being a centralized mail server, is Grapevine's biggest win.
- The very idea of distributing naming and registry services along with mail storage, leading to a highly decentralized system, is perhaps, the biggest contribution of this paper. Today, completely decentralized architecture is a dominant design paradigm for distributed systems.
- Inventing the thumb rule of today's cloud computing and data centers: "If you want to scale performance, don't buy one powerful expensive server. Buy a hundred commodity low-range servers" is the next biggest contribution. Increase in users is compensated by adding servers to store mail and also act as registries (without straining existing registries).
- Clever borrowing of techniques from the networking world, such as, grouping names into registries and introducing additional levels of registries if need arises (Just like DNS, class-based addressing, subnets and supernets) is a very interesting aspect of Grapevine.
- Another simple idea that is used in the paper is to handle large distribution lists by steering them to different servers by registry. This rule, "if it's too big, break it up and designate" has become the norm form for MapReduce and similar frameworks.- Another key contribution is the idea of designating primary inbox servers based on physical proximity and access commonality among users.
- Another classic rule exemplified by the paper through its way of updating just the changes to distribution lists to replica registries is: "If updates are frequent, don't propagate the entire updated data over the network. Just propagate the changes". The same rule (with the network replaced by disk) is even used by Log Managers in databases.
- Since membership queries delayed access control checks, individual and group names were syntactically differentiated. This introduces a good rule in analyzing distributed systems' performance: "Check and remove useless and redundant operations".
- The paper had pointers to several nice techniques used by server sysadmins even today, like maintaining checkpointed syslogs and ssh-ing into the server to monitor it.
One flaw from a system's perspective is having both mail delivery and registry services running in the same server. The reason is, having two different forms of data on the same server might mitigate the performance benefits coming out of the storage hierarchy (seeks in disks and locality in memory and CPU caches). Having all mail requests go to a dedicated mail server and all registry requests go to a dedicated registry server might have improved performance.
There are no other flaws. One thing that could have been done:
- Eventual consistency is a problem in any large-scale distributed system and the Grapevine is no exception. The designers could have estimated a consistency window (in minutes) to mitigate a few problems caused by eventual consistency. For example, in the use case of mailing a list after updating it, the client could wait the consistency window and then send the message (so that all replicas of registries are updated).
Almost every part of this paper is relevant to today's distributed systems. The paper has laid out the most fundamental design of a large scale distributed system, has described the most fundamental problems in that system and the solutions to the said problems. It has coined the most fundamental thumb rules in designing systems.
Posted by: Venkatesh Srinivasan | January 26, 2012 12:28 AM
The paper discusses the challenges and lessons learned in building a scalable, efficient, reliable distributed system that provides authenticated mailing, and access control services to a growing user base.
The major challenge that the Grapevine distributed system designers expected and aimed to solve was the performance scalability of the system with the increase in number of users. The system required many modifications to adapt to the changing environment as problems due to bad design decisions manifested themselves.
One of the major contributions of the paper is the approach towards designing a scalable system. The authors aimed at keeping the load on a single server constant by restricting the size of data that it is responsible for, to a constant. This would require to introduce more servers of the same or equivalent capacity as the load increases. This made the problem simpler and easily solvable. They also introduced another idea of using hierarchy (partitioned name services) with replication, to distribute the load among a subset of the servers in the system based on expected user communication patterns. Grouping data into registries with locality awareness (sharing message bodies) is another important contribution by the paper.
On the other hand, the assumption that making the load on a single server constant made the system scalable seems incomplete. This assumption doesn't consider a corresponding increase in load on the network as the overall load increases. This is important because network congestion due to the increase in load can directly affect the performance of the whole system. Thus, intuitively, the sweet spot for scalable system is in between many servers with constant load (N servers -> increase in networking) and single large server (no or minimum networking).
In summary, though the paper speaks about a specific domain (mailing, access control services etc.), in general it is a very relevant case study about challenges in designing distributed systems. This is because the current distributed system designers still face the same challenges even though they are backed up by better infrastructure (more reliable network, larger and cheaper disks, etc).
Posted by: Venkatanathan Varadarajan | January 26, 2012 12:16 AM
This paper describes the efforts to implement a distributed mail system called grapevine. In this paper they list design principals that they had for the distributed mail system and discuss various challenges that they encountered while building this system. The distributed mail client called grapevine was designed to solve the issue with getting messages between various users in a very large system (in the paper they were looking at building a system that could possibly reach over 10,000 people). This system supported some unique features such as a primitive version of a mailing list (which caused quite a few issues) and the ability to check mail without knowing much about the underlying mail servicing facilities.
However when building this system they came across numerous issues that could (at times) cripple the system or reduce reliability. The issue's that they came across are the real interesting parts of this paper. Most of the issues that they discovered when building this system apply to more generic distributed systems to this day. One of the more interesting issues that they came across was how to keep the states of the mailboxes consistent between the backup and primary mail servers. One particular example that they came across in their original implementation that was interesting was how moving mailbox server list's could cause serious performance issues in there original implementation. The original solution that they had for this issue was essentially to “remail” all of the mail in the inbox. However as they found out this caused issues with load (especially when the inbox was particularly heavily loaded). The particular reason this example was interesting was that showed how improper design of administrative actions (such as removing an account) could cause issues with general performance of the system as a whole. Another issue that I personally found of interest in this paper was the various issues they had with scaling the mailing list. The authors seemed to have underestimated the popularity of the list's which lead to a non-optimum design for how to scale mailing list traffic (long message delivery delays of over 10 minutes was cited in the paper). This case is a microcosm for how inefficient mass distribution of data can destroy system performance.
There were some things about the design choices that the authors made in this system that were a bit concerning. In the beginning of section 4 (effects of scale) the authors discuss that they want to be able to add servers to the distributed system to increase system capacity. However these systems were to have a fixed upper bound on the computation that they perform. This seemed like a weird design principal to mandate that every server in the network have exactly the same performance characteristics and seems to ignore the constantly increasing power of microprocessors. Assuming I am interpreting this correctly, this particular design feature seems like it would limit performance of any one server to the speed of the slowest machine in the distributed system. With this principal it seems like they are being forced to buy additional servers to service an increasing workload and any proposed upgrade to machines in the network would have to have them ALL upgraded before there was any impact on performance. Another concerning design choice in this paper was some handling of error cases. Some error handling cases listed in this paper were very simplistic and didnt really perform actual error correction. One example was shown in section 6 where they had an issue where a user was added by one registration server then immediately registered on a list of another registration server. The solution to this problem that they used was essentially just to ignore it and retry.
Posted by: Benjamin Welton | January 25, 2012 11:45 PM
The paper talks about the observations and some proposed solutions that were
made by the designers of the Grapevine, a distributed messaging system. The
messaging system also constituted a registry service as a part of it since
there wasn't something like DNS back in those days. The clients of grapevine
can be humans as well as other programs like file services and message
servers. Grapevine has servers that handle messages and servers that do a more
administrative task of maintaining registries and access control lists. Both
these are distributed in almost similar fashion and are closely coupled,
trying to make it look like one single system that inherently provides both
the services.
The paper discusses several observations on the initial design of grapevine and
how those affected the performance of the system as the system scaled up over
time. Some of them, like effect of having a clear distinction between group
names and individual names, handling the updates to the huge distribution
lists seem like they should have been a part of the initial design as they are
fairly intuitive. However, the important contribution of the paper is trying
to document the observations made, identifying the reason behind it and
proposing solutions. Interesting ones like how the organization of
distribution lists based on geography to organization affects the performance,
how the replication policy is affected by the distribution list are very
interesting. However, some of the solutions like the configuration to handle
huge load in some servers are more like unevaluated suggestions for the next
iteration of the grapevine design. Several proposed solutions like the ones
for the re-mailing of inboxes still dont sound reasonable enough. Instead of
manually marking the inbox site and moving, the grapevine servers can choose to
slowly migrate the mails from the old to new site depending on the load in
those sites. Or, the new site can still handle the request and fetch the mail
from the old deleted site, moving it on the demand and marking it as deleted
once everything has been moved.
The Grapevine system was built with similar consistency models for both the
registry service and the messaging service. Both have weak consistency. While
its okay if messages are not propagated in time to all servers, for
administrative tasks, I feel that a strong consistency model should have been
adopted. Several problems arising due to removing of names from a registry
could have been implicitly solved by trying to enforce strong consistency only
for the registry service.
Grapevine appears to be the precursor to modern day email and DNS. Some of the
proposals the authors make like mail forwarding in case of not being able to
make a direct connection between the sender and receiver are very much common
in the email system we used today. The paper is not directly applicable to
building a distributed mail server today since the problems being tackled are
not the same when it comes to communication, the hardware used for the servers
etc, but it gives a picture of what all to look out for while building a
distributed system in general. The stress on configuring the distributed
system by observing its performance and analysing the factors driving those
numbers is true on any given day.
At the same time, it also touches upon the importance of the initial design of
a distributed system and general software engineering practices. Building
something and fixing it later leads to code decay. As we observe the system
and make new changes to the initial design, it is always observed that new
bugs show up. Glueing together the new design after several iterations of
change and parts of the initial design could be challenging from a software
engineering perspective.
Posted by: Sathya Gunasekar | January 25, 2012 11:29 PM
The paper describes the experiences with scaling and operating Grapevine, the distributed system that provides e-mail and naming services. Authors give short overview of the system design and then discuss issues they faced when growing the user community, lessons they learned on configuration of the system and ability of the system to adjust to the load and provide reliable service.
When authors designed the system, there were no best practices to look up and it was hard to perceive what implications would their design decisions and goals have. Some of the decisions proved wrong and they had to rewrite parts of the system. For example, when doing registry update, they sent the entire new entry to the replication of the registry. Once the user base grew, the changes were more frequent and some replicas were lagging few hours. Before actually using the system, it was hard to tell how much changes would the users make to the registries and would it be a serious problem.
The biggest contribution of this paper back in 1984 was discussion about what can go wrong when building a distributed system and description of new solutions to these problems. I'm sure that everybody designing their own distributed system throughly read this paper and it gave them a new framework to think about problems they might not have perceived otherwise. They also have a good discussion about the upper bounds for scaling the system. The point they make is that the load on a specific server should not grow linearly with number of users. They made the mistake with distribution lists. One distribution list was stored on one server. However, the all purpose distribution lists had a fraction of users on them, which grew linearly with number of users. Thus, the load on one server was growing linearly with number of users and it soon became a bottleneck. Also, one of the important remarks was that when doing replication, there has to be some spare resources in the system. Otherwise, when one server fails, others would become overloaded and fail, too.
One of the flaws of the paper was lack of figures that would better explain some concepts. There are also some flaws in the system, the biggest one being too weak consistency. The system was designed to be weakly consistent with trade off for scalability, but from the paper it seems like this caused lots of problems, especially with users not expecting it. With registry replicas lagging few hours it is hard to build anything critical on top of the system. There is also a problem with big nested groups that need to fetch the whole distribution lists from different servers just to see if a certain entity is in the group. With file access control lists, they solved the problem by caching the list for 12 hours, but the fundamental problem is in the design that allows arbitrary nested groups. They were also not much concerned about security, transporting passwords in plain-text format.
It is fair to say that we still use many of the concepts mentioned in the paper in building distributed systems today. For example, DNS protocol uses the idea of naming hierarchy and we use e-mail systems with architecture similar to Grapevine's.
Posted by: Igor Canadi | January 25, 2012 10:59 PM
Title : Experience with Grapevine: The growth of a Distributed System.
Authors : Michalel Schroeder etc.
Circa : 1983
#Citations: 66 - 1 Self.
(1) Summary:
This paper describes experiences with operational "Grapevine Distributed System"
implemented at Xerox research Internet.
Grapevine is a replicated distributed system that provides message delivery,
naming, authentication, resource location and access control services. The
primary objective of the system was to increase system (to its maximum
capacity) using fixed power servers. (dedicated server computers with 128Kbytes
RAM , 5M disk storage and 30 microsecond latency).
This paper presents, symptoms and causes of non-scalability and non-transparent and
sometimes annoying behavior of the systems and present solutions to improve the
responsiveness of the system.
(2) Problem Description:
The main objective of the Grapevine system was to develop a reliable distibuted
mailing system with client and server paradigm. Basically, Grapevine services
are divided into two "Message Service" and "Registration Services" and a Grapevine
serves take the role of both "mail server" and "registration server".
(No single theme, too many problems and solutions described in the paper).
(3) Contribution of the paper:
Perhaps the biggest contributions of this paper are the insights into the real
system under operation which sometimes is far from the original system design
objectives. The authors have clearly mentioned all the issues in details, the
symptoms, causes of unexpected behavior and provide some new improved design i
to meet with the growing demands of such systems.
Second contribution of this system was to separate message delivery function from
the message creation and interpretation.
(1) A single system view perceived by the end user. Distribution and replications
are transparent to the user (but frequently noticeable by the users).
(2) Division of registration database into registers was crucial for scalalability.
(3) Number of replicas independent of number of servers and number of users.
(4) A message which is shared among multiple users is stored only once on the
server.
(5) Where to place replicas ? Five consideration have been described in the paper.
(6) Automatic re-mailing from removed in-box site is not a good thing to do.
(7) A filer server does not contain the passwords of all the clients instead it
checks the user's credentials.
(8) Results of authentication and access control are cached in the file server to
increase responsiveness.
(9) Fault tolerance in inbuilt into the system.
(4) Flaws in the paper and system design:
Perhaps the biggest design flaw in the "Grapevine" was the initial assumption that
users and systems were disciplined and cooperative, which is never the case.
Perhaps it was not desined for graceful degradation and non-cooperative users.
(1) A user is allowed to become owner of a group and remove arbitrary members.
(2) Resource location algorithm is sub-optimal.
(3) Distribution lists hindered scalability which were based on user's interests
rather than some optimal criteria.
(4) System could not perform well under stress tests. Although the system design
was transparent, long delays in propagating registration enabled user to
detect system design.
(5) Using only geographical registries list was a design flaw, but purpose and
use of the list was more important.
(6) Distributed consensus is missing in the system.
(7) Received messages are deleted on the assumption that an user has cached the
messages somewhere.
(5) Application:
Modern systems are highly matured, scalable and fault tolerant. When compared to the
hours of waiting as in Grapevine, Google and Yahoo mails gives response in few
seconds, gives us a feeling that "Grapevine" was a big baby step.
(6) Terms unfamiliar:
(A) Stylized topology ( Page 4: Para 4).
Posted by: Chaman Singh Verma | January 25, 2012 09:32 PM
In this paper, the author described the experiences in using GrapeVine and potential improvements that can be made. GrapeVine is a distributed system which can be used for Email, naming, authentication, resource location and access control. It’s designed to be a general distributed system framework that provide reliable and scalable services. Mailing service is mainly discussed in this paper.
Servers and clients in GrapeVine may geographically dispersed and connected through the internet. There are two types of servers: Registration server and Message server. Registration server is in charge of authentication, naming, access control and resource location. It contains the registration database which maps names to machines, services and distribution lists. The registration database is replicated and shared through communication for the purpose of reliability when server failure occurs. The Message server cooperates with Registration server to accept and deliver mails. The Message service is replicated since any Message server can accept and deliver any mail. But the message body is not replicated due to the limit of resource and the benefits derived. At client side, GrapeVineUser package performs the resource location and is transparent to the user.
To make the system highly reliable and available, message service and registration data are replicated. Also GrapeVine has the ability to increase system capacity over a large range by adding more servers of fixed power. This is achieved by decentralized registration service and replicated message service. There is no centralized server that will prevent the system from scaling up. The limitation is the configuration data stored at registration server and the increasing messages sent through the internet. When server failure occurs, GrapeVine can continue to work without disturb users. For example, when message server fails, resended mail can be delivered by another live message server. When registration server fails, replicated data in other server can be used. Also GrapeVine provides tools for expert to diagnose and recover the system remotely. Also extra resources should be allocated for recovery in case failure occurs to prevent chain reaction that causes complete system failure.
Replication can provide high availability but it also causes the problem of data inconsistency. The registration data base is replicated, and any registration server can accept updates and propagate update to other copies. It usually takes several minutes during which any service operation involving the updated data could result in inconsistency. For example, when a new user is added to group 1 at server A, and before the update propagated a request to send a critical email to group1 is accepted and processed by server B will result in missing the new user. Also the service replication of message service could cause duplications. When a Message server crashed before deliver the message to the inbox of the recipients, the client will resend the mail and it will be processed by another Message server which causes duplication.
Posted by: Xiaoming Shi | January 25, 2012 09:23 PM
Seth Pollen
This paper summarizes the observations and enhancements made during the operation of the Grapevine distributed mailing system at Xerox between 1981 and 1983. The problem solved by Grapevine was the provision of unified authentication and asynchronous message delivery services to a large, geographically distributed organization. This paper in particular presents the efforts made to improve performance as the system scaled from 1500 to 4400 users and client programs.
One of the most significant improvements noted was to enlist the servers in a more active management of the network. For example, equipping message servers with knowledge of the locations of unreliable links would allow messages to be staged after crossing each unreliable link, reducing the number of retries needed to complete message transfer. In the same way, it is noted that mailing list expansions can be accelerated by forwarding the list to a server which is geographically closer to a registry storing the mailing list’s contents.
One disconcerting solution given in the paper was the introduction of access control list caches on file servers. This was done to reduce the network traffic needed for frequent checking of file permissions. It seems unwise, however, to add this ad-hoc extension to the carefully crafted distributed registry system. Changes are propagated rapidly and reliably among registry servers, but this cache on the file server is fully updated only once every 12 hours. While caching nicely insulates the file servers from the task of processing each new registry update, it may be better to somehow allow parts of the distributed registry database to be robustly replicated on file servers, providing reliable local access to access control lists.
As an account of the failures and successes of the Grapevine installation used for daily operations at Xerox, this paper is directly applicable to real systems. Furthermore, many of the issues not fixed in Grapevine were solved for the subsequent Xerox 8000 NS production system. Grapevine is similar to modern mail systems, though some of the issues raised in this paper (such as permanent server-side mailboxes) are no longer contentious. Maintaining a (mostly) consistent distributed registry under frequent updates is a problem still faced today. While we now rarely see a circular dependency between email and authentication services, Grapevine provides a well-reasoned implementation of both of these services, each of which has great value to modern distributed computing.
Posted by: Seth Pollen | January 25, 2012 08:45 PM
This paper describes the Grapevine system, a distributed electronic mail and name service developed at Xerox PARC in the '70s and '80s.
The goal of Grapevine is to provide a reliable, scalable service that would perform two functions: (1) accept and deliver mail; and (2) resolve names to individuals. To this end, Grapevine is divided into two systems. One was the registration service. The registration service provides the resolution of RNames. Each RName is formatted as F.R, where R is a registry name and F is the name of something within that registry. Registries are replicated across Grapevine servers and contain lists of RNames. Each RName entry provides either a list of other RNames or some metadata about an individual, including the locations of that individuals inboxes. The other system was the message service. This service handles delivery of messages from one user to an inbox. A message addressed to a list of RNames can be delivered by the message server by resolving the RNames into a list of inboxes.
Grapevine contains many semblances to the modern day email system and DNS. Its main contribution was exploring how messaging and name services could be implemented in a reliable and scalable manner to maximize availability. This paper in particular focuses on mistakes that were made in the original design of the system and how the design was amended to correct them.
The authors attempted to use Grapevine as a general-purpose tool for approaching several different problems in distributed computing. Grapevine can be used not only for communication between human users but also for communication between programs. This leads to several flaws in their design. For example, the authors discuss the file server using distribution lists in the registries as access lists for files. This caused availability problems when distribution lists were deeply nested because the registration server would have to recursively expand the lists in order to find out whether the accessing user was there. Their solution was to maintain a separate, flattened distribution list for each deeply nested list, and to update the flattened version every 12 hours. However, this introduces a security concern, because it becomes impossible to rapidly revoke someone's access privileges if necessary.
Another flaw was in their overly simplistic solution to the problem of inbox relocation using remailing. The original system would remail all the messages from the old inbox to the new inbox whenever the old inbox was deleted, but this at times generated an unacceptable load on the system. Their response was to only remail the inbox when it was below a certain size threshold. They also write that they think it would be better to simply mark an inbox for deletion and delete it when it becomes empty through the usual message retrieval process. However, it seems to me that it would be simpler to simply use a separate protocol for moving an inbox that directly transferred the data between two servers rather than remailing each message separately. The one advantage that the authors' solution has over this is that it is conceptually simpler since it reuses Grapevine and would not involve introducing another protocol.
Both of these issues arise from the monolithic nature of Grapevine's design goals: one flexible system for all distributed computing. It seems to me that this sort of thinking is not so popular anymore. Even the name and messaging services have been split into two distinct systems. Further, in Grapevine, each service is a client of the other, whereas modern systems tend to take a layered approach with asymmetric dependencies between layers, since this tends to make testing and debugging easier and designs more modular.
Posted by: James Paton | January 25, 2012 04:15 PM
The paper describes lessons learned in the scaling of a large distributed e-mail
system called Grapevine. It primarily focuses on design decisions that have
caused performance problems in a system serving 4400 individiuals.
The problem Grapevine is supposed to solve is communication between individuals
that work at a large company, Xerox. Xerox is geographically distributed, and
thus all servers used for Grapevine are connected to an "internet" with a number
of servers at each geographic location. This problem is quite important since
individuals of the same company have a natural need to communicate and
collaborate. Since each employee has their own computer to work from, sending
electronic messages is the fastest way to communicate other than conversing.
The contribution of this paper is a list of lessons learned in distributed
computing. This was a novel contribution because at the time, Grapevine was one
of the largest distributed systems in the world and had a very active and
growing base of users. This contiribution built off the previous paper which
described the initial design of Grapevine without much experience in
administering a large-scale implementation of it for a long period of time.
This paper adds this experience and describes certain situations brought on by
those initial decisions that created adverse performance. Some events are only
documented as happening one time. In effect, this paper is a more detailed
evaluation section for "Grapevine: An Exercise in Distributed Computing". Thus,
this paper lacks much of an evaluation section, since it merely proposes
possible solutions for the failed techniques highlighted.
The largest flaw to me in the reasoning behind proposed changes to Grapevine was
the lack of acknowledgment that failure is scalable. The authors definitely
cover the case of a machine or link failing and preserving availability by
having many servers running in the system and able to perform any Grapevine
task. However, the design decision of not replicating message bodies is partly
backed up by an experiential statement of "We have had only one total disk
failure." It is easy to see that the chance of disk failure increases with the
size of the system AND the lifetime of the system. Avoiding complete loss of
messages is mostly solved by two things: the assumption that the user will store
important messages on their workstation, and an archiving system that uses a
file server to archive unread messages over a week old. The first assumption
only prevents the loss of read messages. The archiving system greatly decreases
the chance of loss of an unread message, especially if the user check for new
messages frequently (which is probably the most common case). The flaw in the
reasoning for the archiving system, however, is that this system is mostly in
place because of the small size of the disk attached to each Grapevine server (5
MB). The author goes on to propose that an 80 MB disk would take away the need
for archiving completely, like this is a desirable goal. I believe that the
archiving system is necessary for all scales of the system since a complete
loss of an unread message is not acceptable.
Real distributed systems today are often much larger, more fully-featured, and
used for more complex tasks than Grapevine was. Viewing the design decisions
and ways these decisions were good or bad for the use is useful in the design of
any distributed system though. This is especially true in the Grapevine
designers' experience in witnessing certain user behavior that was thought to be
seldom becoming a normal use case, such as an employee using the terminal to
access messages on Grapevine and not permanently storing her messages on her
computer. The definition of the common case and how it could change is worth
prolonged thought in the design phase of a distributed system. Also,
Grapevine's policy of "remailing" all messages on a message server when an inbox
is moved to another server and the automatic execution of it put unnecessary
strain on the entire internet. Real systems should closely examine when it is
actually needed to send large messages over the network and closely examine the
cases where the system would do this in an automated fashion. The idea of
Grapevine's (possibly necessary) weak consistency of data that eventually became
correct prevented many tasks from using Grapevine as their mechanism to
distribute across many machines. This is still very true today. The stronger
the consistency of data guaranteed by a distributed system, the harder it is to
implement. Thus, real distributed systems still cannot solve problems where
complete accuracy is needed.
Posted by: Evan Samanas | January 25, 2012 10:45 AM