Distributed snapshots: determining global states of distributed systems
K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985.
Review due Thursday, 2/10
« Time, clocks, and the ordering of events in a distributed system | Main | The Byzantine Generals Problem »
K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985.
Review due Thursday, 2/10
Comments
Summary
This paper presents an algorithm for taking snapshots of global states in distributed systems, and demonstrates global state detection enables the solution to stable property detection problems, including deadlock detection and termination detection.
Problem
The problem is how, who, and when to record process states and channel states in parallel to normal computation.
Contribution
1. This paper models a distributed system as a set of processes and a set of channels. Processes are able to record their own states and the states of communication channels that they are incident on. A global state of a distributed system is a set of component process and channel states. Although a distributed model is not important as this paper states, one has to really understand the model they describe in order to learn what is going on in the algorithm proposed in the paper.
2. This paper uses four simple examples (2.1, 2.2, 3.1, and 4.1) to clearly motivate and illustrate the algorithm.
3. This paper proposes a global state detection algorithm. The algorithm works by sending a mark between two processes, thus this makes state detection be concurrent with computation. The algorithm contains two rules. First, a Marker-Sending Rule for the sender process p. p sends one marker along channel c after p records its state and before p sends further messages along c. Second, a Marker-Receiving Rule for the receiver process q. If q has recorded its state, q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c; otherwise, q records its state and the state c as the empty sequence.
4. This paper proves that the algorithm terminates. Termination in finite time is ensured if for every process q: q spontaneously records its state or there is a path from a process p, which spontaneously records its state, to q.
5. A recorded state S* may be different from all actual global states, but S* can be reached from any actual global state, and the terminal state can be reached from S*. This conclusion easily enables stability detection.
Flaw
Similar to other theoretical work, this paper makes quite a few assumptions that are reasonable. For example, in this paper, channels are assumed to have infinite buffers, to be error-free, and to deliver messages in the order that they are sent. This assumption can be realized by high level protocols like TCP. Therefore, I would not attack this paper of making the assumptions.
In addition, this paper focuses on global state detection, and omits global state collection that is explicitly acknowledged in the paper. Therefore, I would not attack this paper of missing such and such points. However, I'm concerned that the global state collection would become nontrivial, if we deploy this global state detection algorithm in a distributed system with massive amount of nodes.
Applicability
As pointed out in this paper, the global state detection enables stable property detection (e.g., distributed deadlock detection) and checkpointing. This is useful in improving the robustness of distributed system.
Posted by: Wenbin Fang | February 9, 2011 02:09 PM
Summary: Chandy and Lamport describe a means of ascertaining reliable information about persistent global predicates pertaining to a distributed system. They provide a formal description of a generic distributed system and superimpose a framework for collecting information from each process to form a global image of the system.
Problem: The tightly defined goal of the paper's analysis is to correctly determine if a stable property of a distributed system has reached its stable state. A stable property is any boolean proposition that is initially false and may eventually become true but never revert back to false. An ancillary requirement is that the mechanism used to gather the information cannot change the semantic functioning of the system.
Contributions: The first contribution by the paper is a formal description of a distributed system in its most general form. By identifying the atomic components of a system, processes and communication channels, the groundwork is laid not only for their analysis but for the broader study of a wide range of problems related to distributed systems. Furthermore, the authors demonstrate a process by which to construct "meaningful" global information about a system in the presence of communication latency. They show (as with the excellent photographs-of-birds analogy) that the information gathered can be useful without actually being technically accurate.
Flaws: Given the explicitly limited scope of the goals and the intentionally theoretical nature of this research, it's difficult to find fault in its lack of practical motivation. However, since they did provide a few selected stable properties (e.g. "the system is deadlocked"), it could have been a useful and enlightening exercise to walk through a concrete example of their proposed framework in action, even on a system as prototypical as the ones used in their proof.
Applicability: The most interesting and practical application of the instrumentation suggested by this paper is in the realm of consistency checking. That is, the "stable properties" can be defined as "A has been observed," where A is a state indicating inconsistency and instability in the system. By checking that such a property never becomes true, we can implement a kind of live model checking for the system. Importantly, because a recorded state S did not necessarily ever occur but is proved to be possible, this allows for the chance that issues can be identified before they happen.
Posted by: Rich Joiner | February 9, 2011 07:54 PM
Summary:
The paper is about an algorithm that can be used in distributed systems to determine a global state of the system which is useful in stable property detection problems such as deadlock detection.
Description of Problem:
Determining the global state of a distributed system at a specific moment in time is not any easy task. This requires all the machines in the system to take a snapshot of their state at the exact same time therefore requiring all the clocks to be in sync which is a difficult task. Therefore the problem is to be able to combine a set of local states taken at different times to form a global state of the system that is meaningful.
Summary of contributions:
The authors developed a global state detection algorithm that can take a meaningful snapshot of the system. They also proved some properties of the global state derived from their algorithm that can be used to detect stable properties.
Flaws:
This paper is a theoretical paper so the authors makes a lot of assumptions such as channels having infinite buffers, error-free, and in-order delivery which are reasonable since these assumptions are not the problem they are trying to solve and there are protocols and algorithms that can be used to tackle these assumptions. I didn't really understand what type of function someone can use to detect a stable property such as if system is deadlocked, so maybe if the paper provided some example function would be helpful.
Application to real systems:
If the assumptions from the papers and other problems that arise from implementation can be solved, then it seems like this algorithm can be useful in stable property detection problems as mentioned in the paper.
Posted by: Kong Yang | February 9, 2011 08:55 PM
Summary
This paper describes how nearly correct snapshots that are equivalent to real snapshots for many purposes can be gathered on a distributed system. It also proves how these nearly correct snapshots can be used for a number of purposes.
Problem
Using distributed systems can provide great performance scalability, but often it is necessary to have global knowledge of the entire system. Obtaining global knowledge is difficult, as it is not possible to coordinate each node taking a snapshot at the same time.
Contributions
The paper made three main contributions:
1. A generalization of several distributed systems problems is provided. The paper points out that several problems such as deadlock detection and termination detection can be generalized to stable-property detection.
2. An algorithm for creating a distributed snapshot is described. The algorithm works by placing markers on each communication path after the sender has finished recording its state. On the receiving end, the receiver will consider all the messages it receives up until the marker to be on the connection for the purposes of the global snapshot (assuming it has not started taking a local snapshot yet). The resulting global snapshot is different than the actual state of the system, but it is a state that could be reached if the network delivered some packets faster and others slower.
3. The paper shows how a global snapshot can be used to solve the stable-property detection problem. Since the pseudo snapshot that is taken is of a state that will eventually merge with a future state of the actual system, the pseudo snapshot is useful for recognizing stable-properties (properties that remain once they arise).
Flaws
One of the weaknesses of the algorithm that is not discussed is the complexity of the algorithm. If each node must send a marker along each connection to every other node, the number of messages sent is N^2 where N is the number of nodes in the system. If there are a large number of nodes, this clearly is not feasible.
Another weakness is that the limitations of the “snapshots” are not thoroughly discussed. For instance, the snapshots are useful for problems such as deadlock detection, but not for any sort of statistics gatherings. For example, one might want to occasionally take snapshots to see the current load of each node for load balancing purposes. However, the snapshots taken by this algorithm would be inaccurate for that purpose or for any measuring research. It would be nice if the paper more thoroughly discussed the boundary between cases where the pseudo snapshots taken by the algorithm are useful and when they are not.
Application to Real Systems
The paper mentions several applications in real systems. (1) Checkpointing. For long running tasks, it’s useful to take occasional backups in case the system crashes so that you don’t need to start from the beginning. (2) Detecting deadlock. This is useful because the system needs to recognize and kill one or more tasks when there’s a deadlock. (3) Computation termination. This could be useful for cleaning up after a computation is finished; although, it’s hard to imagine that this would be very difficult to detect with a simpler scheme than snapshots.
Posted by: Tyler Harter | February 9, 2011 10:09 PM
Summary
Describes algorithm to record the global state of a distributed system by a process.
The Distributed Snapshots papers, describes an algorithm to record the global state of a distributed system by a process. Each process records its local state and shares the recorded state with other processes. The collected recorded states are then used to construct a global state of the distributed system.
Problem
The paper provides a simple algorithm to correctly determine the global state from asynchronous snapshots of local states amongst distributed processes. By providing a correct global state of the system, it permits algorithms to be developed for problems, such as deadlock detection, which require a global view of the system. Existing solutions that provided a global state of the system proved to be either incorrect or impractical for use.
Contribution
The paper made several key contributions to distributed systems. Firstly it provided a fairly straightforward algorithm (the Global State Recording Algorithm), which allows each process to determine the global state of the system by exchanging recorded local states. Additionally, the paper demonstrated that stability predicate, that is, a predicate, which returns the same value for a given state S, and all subsequent states reachable from S can be used to solve several problems.
Flaws
I have not noticed any major flaws or shortcomings of the paper. Though several assumptions were made in the paper, such as a reliable channel, most if not all have been shown by history to be reasonable assumptions. However, one minor shortcoming with the paper/algorithm is the initiation of recording of a local state by a process, which requires at least one process to spontaneously record its local state, which then passes this information to all other processes (in a fully connected graph) resulting in the remaining processes recoding their state. However the discussion on spontaneous recording is postponed and I believe even a terse overview would have been useful in the paper.
Application to Real Systems
As mentioned earlier the paper provides several key contributions, which aided in solving several classes of problems in distributed system such as deadlock detection.
Posted by: Greig Hazell | February 9, 2011 10:23 PM
Summary:
The paper describes global state detection algorithm of s distributed system. The algorithm is based on sending marker messages indicating other nodes to record their state and the state of the channel at that time. Being a theoretical paper many assumptions are made for the algorithm to work in a real system. The paper also describes important property of the recorded global state, i.e. the initial and the final global states are reachable from the recorded state. The paper also talks about stable state. Stable states have a stable property which is implies the stable property for all future states for e.g.: the system is deadlocked.
Problem Statement:
The main problem the paper is trying to solve is to how to detect a global state of s distributed system in the constraints of limited knowledge of the entire system. Each process can record their own state and the communication channel linked to them. The paper also looks at the problem of finding stable state in the system like deadlock detection.
Contribution:
The most important contribution of the system is the state detection algorithm itself. The algorithm is intuitive and logical. The paper is well written and systematically presents (1) why is problem difficult (partial view with each system), (2) algorithm and the proof of the important property of the system. It is also supported by many examples to show a step by step practical process of its execution bringing out a clear picture. The paper clearly states the assumptions and definitions for the algorithm.
One of the important aspect paper talks about is that the state recorded by the algorithm might not be same as the actual global states but the initial and the final states are reachable. The property is primarily useful in stability detection. As by the algorithm, if the stability property holds true at the distributed snapshot state, it will continue to hold in future state implies that the property will hold in the final state as well. This implies that such the state detection algorithm is useful in detecting stability properties of the system.
Flaws:
The paper is clear in its intentions and the problems it is trying to solve. It clearly states the assumptions and limitations. Although explicitly stated, one of the limitations is that of broken stability and hence phases of ‘reinitiated’ computation after a stable state have not been solved. This limits its usage and requires further work to solve such real problems.
Also, in spite of talking about the property of the distributed snapshot, the paper does not talk about the places where such an algorithm will fail to detect an accurate state or can mislead the state of the system. This also might limit the usage of the algorithm in real system.
Application to real world:
The algorithm is useful for state detection in the real world system and seems feasible. It is also useful for termination and deadlock detection in distributed systems.
Posted by: Ishani Ahuja | February 9, 2011 10:45 PM
Summary
This work describes an algorithm and mechanism to asynchronously determine a global state of a distributed system. The algorithm is can be used to detect so-called "stable properties" such as deadlocks and termination of computations, it can also be adapted to provide system checkpointing.
Problem
Determining global state in a distributed system is difficult because no single process can see everything in the system simultaneously. Instead each process in the system can only observe its own internal state it communication over incident channels. The authors make an apt comparison to a group of photographers attempting to capture an entire panorama of migrating birds by taking a series of independent asynchronous snapshots. Even though the resulting composite photograph (or system snapshot) is not an exact instantaneous global view, we can still use it to infer meaning about the underlying global state.
Contibutions
Chandy and Lamport present an simple algorithm that is essentially an application of the ideas from the earlier "Ordered Events" paper. The system uses marker messages and "happened before" relationships to record all internal states an in-flight messages in the system. When a process first receives a marker on a channel it records its own internal state, begins to record incoming messages on all other channels, and then sends a marker to all other processes in the system. With each subsequent marker received on other channels the process stops recording incoming messages on that channel. Eventually every process will have recorded its own state and collected all "in-flight" messages from all its incoming channels. The key insight is that the system can be viewed as a state machine, and that the "actual" global state at algorithm termination is reachable from the "recorded" global state. Thus if a stable property is true in the "recorded" global state, it will also be true in the "actual" global state.
Flaws
The authors do a good job of providing a few simple examples of how the snapshot algorithm can be used for these stable property problems, but it is not readily clear how it can be extended for other kinds of global state problems. For example they mention that many problems depend on phase-like global states. As usual it would be interesting to see discussion of fault handling and efficiency/performance issues even in a theoretical paper.
Applications
The list of potential applications for the snapshot algorithm remain critical to current distributed systems. As mentioned before these include deadlock detection, computation termination detection, algorithm convergence detection. The algorithm can also be used to support checkpointing of long running computations.
Posted by: Kris Kosmatka | February 9, 2011 11:47 PM
Summary: This paper proposes an algorithm to collect a consistent set of snapshots from all the participating processes in a distributed system, which can help us infer the global system state.
Problem statement/Motivation: We have a bunch of processes running a distributed computation. We want a way to detect the global system state. Global state detection can help us determine stable properties of the system (like deadlocks, end of computation etc.).
We can collect the state of all processes in the system to infer the global state, but will they be consistent with each other?
Summary of contributions: The authors propose an algorithm which guarantees consistent snapshots of co-ordinating process states, and all the communication channels. The system state is the aggregation process and channel states.
Any process that wants to initiate the snapshotting, records its own state first and sends a special marker message in all of its channels. A process that receives a marker saves its state first, propagates the marker in all of its channels. If a process that has already saved its state receives a, it starts recording the messages it receives on its channel. Thus, the marker acts as a trigger for snapshotting and the global state of the system can be deduced by the set of process states and channel messages on flight since the creation of marker. Once we have a consistent set of process and channel states, we can infer the global system state.
By nature, a stable property is one that persists: once a stable property is achieved in a state, it stays thereafter. How does global state detection help us detect these stable conditions? The authors show that a stable property holds in all the states including and between the start and end of the snapshot collection algorithm. Thus, with the collected global state, we can test whether the state is manifesting the property or not.
Flaws: I wonder if the number messages the algorithm pumps out is excessive - its at least equal to sum of out degree of all processes - which can be large. Are any optimizations possible here? (Like vector clocking the snapshot initiation message or chaining of snapshots)
We need states of all processes and channels in the system. How would the algorithm handle failures?
What if there are a large number of channels messages in flight during a snapshot collection? Or even if not so initially, there's a possibility for it to grow arbitrarily. In such cases, can a process smartly initiate another marker and thus record its state to clear up the stored channel messages?
Discussion: Although the idea seems plausible at a high level, it would help to see how the algorithm holds up a system more 'distributed' than the 2 process examples the authors have used. The proof the authors present is also very hard to follow.
Posted by: Srinivasan T | February 10, 2011 12:26 AM
Distributed Snapshots: Determining Global States of Distributed Systems
SUMMARY
Chandy and Lamport present a snapshot algorithm used in distributed systems for recording a consistent global state of an asynchronous system(not synchronized by clocks). It describes a model of the distributed system and the snapshot algorithm that uses Marker-Sending and Receiving rules.
PROBLEM
We typically have a distributed system with several nodes and they are NOT SYNCHRONIZED by clocks. One must ensure that there is a GLOBAL CONSISTENT view of the system STATE. A couple of examples of inconsistencies are presented (In a single token conservation problem, spurious information about a single token present in both process and channel and also no information about token in both entities). This algorithm eliminates such inconsistencies and ensures a “meaningful” global system state.
CONTRIBUTIONS
The paper describes a model of processes and directed channels in the distributed system with a set of states and also enumerates the five attributes that describe the event within a system(process of event,before state,after state,message). It clearly illustrates how the state changes in a 2 process 2 channel scenario for a single-token conservation system.
Chandy and Lamport explain a very clear example of inconsistencies that arise while recording global state. Their usage of markers is, in my opinion, the single major contribution to tackle the inconsistent state problem. They define two rules: 1) Marker-Sending Rule where the sender sends a marker along the channel before it records its state and before sending further messages. 2)Marker-Receiving rule if the receive has not recorded its state, its time to record its state and record the channel’s state as an empty sequence or else before receiving the marker it should record the state of the channel. They prove how the algorithm deterministically ends because of the markers and explain some theoretical properties of such recorded global state.
FLAWS/CONCERNS
As with the logical clock paper, set of assumptions are made regarding the topology and processes. However this cannot be considered a flaw in the paper as they can be eventually handled. My concern is about the exchange of messages. Marker messages have to be sent for recording each local state. And all the processes have to exchange this consistent local state with all other processes to get a global snapshot. This might flood the entire network (order of nC2 messages).
RELAVANCE TO MODERN DAY SYSTEMS
I feel the Chandy-Lamport algorithm could be a natural choice for checkpointing snapshots of the modern day distributed systems.
Posted by: Karthik Narayan | February 10, 2011 04:03 AM
This paper mainly discusses an algorithm to detect global state of a distributed system. This algorithm extracts global state of the system by having all the machines record their own state and state of communication channels in a semi-ordered way( and that this has to be done concurrently without disturbing underlying computation).
Two main reasons which make the problem of determining the global state of a distributed system hard are : 1) Inability of the systems to record states of all the machines at the same instant - its because there is no global clock which can make every machine take snapshot of their states at the same time and 2) every single machine can only store its own state, messages sent and received by it. These two limitation can give an inconsistent global view of the system if the states are not recorded carefully.
The main contribution of the paper is the use of marker messages to notify machines to record its state and the channel states before or on arrival of the marker. This mechanism gives an ordering of the events in the system which will result in a consistent final global view of the system upon algorithm termination. All the concepts in the paper were very well motivated, explained. Also sufficient proof has been given that algorithm terminates in finite time.
I don’t see any major flaws in this paper. Author makes some assumptions about the network and information gathering, but these assumptions can be dealt with in today’s world. One small point is that the final global view is consistent only when all the machines are up and running properly, It could have been better of the if the paper presented some opinion about the algorithm’s authenticity in realistic situations where machines often fail.
This algorithm is useful in getting consistent global view of the system- it can help make debugging and stats collection easy.Also as paper suggests this algorithm can be used to checkpoint data and to detect stable properties such as deadlocking ,terminations.
Posted by: Pratima Kolan | February 10, 2011 07:32 AM
Summary:
In this paper Chandy and Lamport present an algorithm to determine the global state of a distributed system. Their distributed system is modeled by channels and processes and the state captures both states of processes and channels. Channels are assumed to be error-free and FIFO. There are a finite number of processors that have finite number of directional channels. They define the state by the 5 tuple which contains information about the process as well as the channel connected to the process.
Problem Statement:
Just as the previous paper that we read, the problem is that in a distributed system we cannot easily have a global view of things happening in the system. In this paper they assume that we do not have synchronized clocks and a process only can record what it sees in the system. I really liked the analogy that they make to introduce the problem. Determining a global state in a distributed system is like getting a photograph of a large scene full of birds. You have to take multiple pictures and glue them together while the birds do not stay at one place for you to record the moment. Also, you are not supposed to in any means effect what the birds are doing.
Contributions:
The idea that you have to record channel states as well as process states is an important contribution. Otherwise, we might miss messages that affect our system in some way since they were in transit.
The main contribution of this paper is to present an algorithm that enables to detect the global state. This algorithm guarantees a consistent global state. They provide many examples along the way which show how the algorithm works which I think makes a well-written paper (even though it was still hard to read since it was very conceptual)
Flaws of the paper:
I do not really see a flaw in the paper. The point that they make strong assumptions about the system was brought up in the previous class. This does not necessarily weaken the paper since all the assumptions should have their own standalone solution and trying to address all those problems is out of scope of one paper.
Applications:
In many systems, we need to perform a particular action once we reach a certain state. Thus, we have to be able to identify if we are in that state. For example, we should be able to detect if the global state of the distributed system is a deadlock so that we can act appropriately. Other applications include termination detection and check pointing.
Posted by: Fatemah | February 10, 2011 07:42 AM
Summary:
Chandy and Lamport describe a snapshot algorithm using which a process can determine the global state of a distributed system of the form of a strongly connected, finite, directed graph, of which each vertex is a node/machine and each edge a uni-directional FIFO buffer of sufficient capacity.
A so-called 'stable predicate' is also defined such that, if it holds in some state, it will hold in all possible later states. If it holds, 'stability' is said to have been reached. The purpose of the aforementioned distributed snapshot algorithm is to collect such state information that, on account of which, stability can be detected. The algorithm achieves this by sending marker messages to other nodes to record their state and the state of the channel at that time
Problem: The fundamental problem that this paper tries to address is to record the global state of a system (in absence of a global sync clock) without disrupting it (i.e non-blocking) (where the global state of the system consists of the state of its processes and the communication channels connecting them). This in turn leads to the question of when this state should be saved and what all messages should be included in it.
Contributions:
The paper has 3 main contributions:
1. Prior to Chandy-Lamport's snapshot algorithm, the kinds of distributed checkpoint algorithms that existed consisted either of an uncoordinated algorithm (wherein nodes saved checkpoints independently leading to problems like domino effect etc) or coordinated (with a central authority to manage) but blocking checkpoint mechanism (flush the channels to make sure there exist no inflight messages). This paper presented a coordinated non-blocking checkpoint mechanism which was a significant improvement over the previous algorithms.
2. Chandy-Lamport algorithm is applicable independently of the specific stable predicate unlike some of the existing distributed algorithms at that time which targeted only a specific stable predicate.
3. It defines relationships among local process states, global process states and the checkpoints in a distributed computation, something, that the paper notes, was not very well understood prior to its publication.
To me, this paper seems like a direct extension of the concepts that Lamport presented in his seminal 1978 Ordering paper. (that we discussed in the previous class). One can argue that the checkpoints/consistent cuts represent the 'happens before' relation. All pre-recording events kind of happened-before post-recording events and the checkpoints represent the logical/causal snapshot of the system. It is also important to note that the algorithm does not promise us to give an actual state of the system but gives us a consistent state - which is then reachable from and reducible to one of actual states.
Also, Chandy and Lamport build on the state-machine concept of the distributed systems by arguing that many problems in distributed computing can be cast as executing some action/event on reaching a particular state
Flaws
Except for some very strong assumptions that the paper makes (like FIFO with infinite buffers), It seems the algorithm (in its current form) may not work for distributed systems that are not strongly connected. (e.g P2P systems). Also it is not clear what happens if the process delays between snapshotting/saving state and sending markers.
Also, I am not sure if flooding the system is an optimal approach. The complexity is O(e)=o(n-squared) to record messages in O(d) time (where e is the number of edges in the system , n the number of nodes and d the diameter of the network). This will not scale well for large n.
Applicability:
This algorithm (or a marker based checkpointing variation of it) can be used in getting a consistent global view of the system and determining its stable properties.
Aside:
I am not sure if it is relevant, but I couldn't help but get reminded of Model checking while reading the paper. Model Checking also deals with determining if a given property (expressed in temporal logic) holds during the execution of a system (abstracted as a model).
Posted by: Rohit | February 10, 2011 07:51 AM