Time, clocks, and the ordering of events in a distributed system
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, 1978.
Review due Tuesday 2/8
« The LOCUS Distributed Operating System | Main | Distributed snapshots: determining global states of distributed systems »
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, 1978.
Review due Tuesday 2/8
Comments
Summary:
This paper discusses the partial ordering of events in a distributed system. It discusses the problems in ordering such events and then provides a distributed algorithm for synchronizing a system of logical clocks which can be used for totally ordering these events.
Problem:
The paper aims at explaining an algorithm that extends the partial ordering of events in a distributed multiprocessor system to an arbitrary total ordering. It then tries to show how this total ordering can be used to solve a simple synchronization problem.
Contribution:
The paper starts off by defining its interpretation of a distributed system. A distributed system is a system in which the message transmission delay is not negligible compared to the time between events in a single process. Based on such a system it explains a partial order of events. A partial order is one in which for any two given events a and b,
i) ‘a’ comes before ‘b’ if ‘a’ and ‘b’ are in the same process
ii) ‘a’ is the sender of a message in one process and ‘b’ is a receiver of the message in-another process.
With a partial order it becomes hard in solving synchronization problems between processes. For example if there exists three processes P0, P1 and P2. Let’s assume P0 is the scheduling process which grants a resource to either of the other processes in the order in which they are received. P1 sends a request message to P0 and P2. On receiving P1’s message P2 sends a request message to P0. If P2’s message reaches P0 before P1’s message then it would be granted the request violating the principles of synchronization.
In order to solve this problem the papers implements a total ordering of the processes. It makes use of a logical clock to measure time. A logical clock is a function Ci for each process Pi, such that Ci(a) is a number for an event ‘a’ of Pi. The paper then talks about a clock condition which states that
For events ‘ a’,’b’ : if ‘a’ -> ‘b’ then C(a)
If this condition is satisfied one has a total order.
This condition is
implemented using the following two rules:
1. IR1 - Each process Pi increments its clock Ci between any two successive events
2. IR2 – If event a in process Pi is sending a message m then a timestamp Tm = Ci(a) is appended to it. The receiving event b in process Pj sets its clock Cj greater than or equal to its present value and greater than Tm.
These two rules help in converting the partial order of events to a total order. With this total ordering synchronization is achieved as all processes order the commands according to their timestamps. A process can execute a command at timestamp T when it has learned of all the commands issued by all other processes with timestamps less than T.
Even with total ordering there could be some anomalies. This could happen because there may be cases in which the system has no way of knowing that A actually preceded B, since the precedence information is based on messages external to the system. To solve this problem the paper suggests in using a strong clock condition. This condition is not satisfied by logical clocks. In order to achieve this condition the paper suggests using properly synchronized physical clocks.
Flaws:
1.> The paper does not account for process failures. The algorithm works only when one assumes all processes terminate properly. Such an assumption is too strong.
2.> The assumption that all messages sent from process Pi to Pj are received in the same order as they are sent is also too strong.
Application:
1.> Total ordering definitely helps in handling synchronization problems as described above
2.> In databases there exist dependency graph of transactions, which can be used when one is trying to replay a particular workload in order to replay all the transactions in the right order.
Posted by: Vinod Ramachandran | February 6, 2011 01:47 AM
Summary:
This paper examined the concept of the “happen-after” partial ordering and discussed how to define a (somehow arbitrary) total ordering of events across the system based on the partial ordering information. The author also discussed how to synchronize physical clocks and derived an upper bound of error for the algorithm he gave.
Problem trying to solve:
In a distributed system, due to the lack of global view, it’s impossible to have a definitive total ordering of events. The partial ordering defined by “happen-before” relation is the only meaningful order without external relevant events. So the problem here is how to assign a total order with doesn’t violate the casualty relation imposed by the partial ordering and how to use this total ordering to implement synchronization.
Also, in the presence of external events (and failure), physical clock becomes necessary to avoid abnormality. The problem here is how to synchronize the physical clock within some given error to insure correct system behavior.
Contributions:
1. The author defines a framework (“happen-before” relation and the space-time diagram) to discuss the ordering problem in a distributed system.
2. The author formalized the conditions for a distributed system to exhibit correct behavior: the clock condition in the absence of external events and the strong clock condition in the presence of external events.
3. The author proposed algorithms to assign total ordering for events across a distributed system so that those conditions are held. They also showed that how this ordering and be used to implement synchronization.
Flaws:
It seems to me like the authors made some strong assumptions which are not very realistic.
1. For the algorithm of assigning total ordering to work, the active participation of all the processes is required, which is almost certainly not true for a distributed system.
2. For the physical clock synchronization algorithm to work, the author assumes that the unpredictable delay of delivering messages is within a certain constant time, which is not a realistic assumption neither.
One additional point is that the total ordering algorithm requires one process to send messages and get acks from all other processes to ensure that it has learned all the preceding requests. This will not scale at all.
Relevance:
The framework and the casualty language the author developed is certainly very valuable when we discuss the behavior of distributed systems today. The algorithms and theorems they gave make too strong assumption; and hopefully follow-on work could relax those assumptions and give us some results which are directly applicable to real systems.
Posted by: Suli Yang | February 6, 2011 08:45 PM
Summary: Lamport describes relativity in the context of distributed systems as a partial ordering of all events in a system. Since there are multiple possible total orderings in this setting, he then introduces a consistent ordering of events using timestamps as well as an ordering consistent with outside events by using physical clocks.
Problem: Determining the order of events in a distributed system is difficult because of separate processes that may not share a common clock. Users must be able to see an ordering of all events in the system that is consistent with a physical clock. Otherwise, two users observing the system from different vantage points may see different orderings. Previous work on clocks estimated differences between distributed sites, but did not use the monotonically increasing clocks that are necessary for the formal conditions Lamport proposes.
Contributions: Lamport observes that a distributed system without synchronization represents only a partial ordering of events. This is a crucial point, as a naive user that expects a system to be inherently aware of physical time may not understand that two events from separate processes are logically concurrent event though they may occur at different physical times. A total ordering of events can be established by time-stamping all messages and updating each local clock to the highest timestamp as it is received. Processes can then be synchronized using broadcast, timestamped messages. To synchronize physical clocks, processes also consider a pre-determined minimum delay when updating timestamps. The innovative aspect of both these primitives is that clocks are always set forward - a process never tries to match another processes clock by going backwards. This method is fundamentally simple because everyone moves in the same direction, but is still proven to synchronize clocks within a certain margin.
Flaws: Lamport observes that this scheme is not resilient to process failure, but claims that failure only has meaning in relation to physical time. I question this claim, because events that are supposed to logically occur after a failed event will never happen. Even though there may be no observable difference between a failure and a long delay, a failure still occurs. Another side effect of synchronization involving every node is scalability, as the number of messages that must be sent grows linearly with the number of nodes. In this scheme, processes must communicate directly with every other process, but there may be a variation that takes advantage of network topology and mechanisms such as multicast to reduce the total number of messages.
Applicability: Lamport's clocks address a fundamental problem that is relevant to every distributed system. Modern systems may require more functionality than this basic primitive, such as synchronized clocks across a dynamically changing set of processes. In Lamport's synchronization example, processes had to be aware of every other process, so using a dynamic set of processes would require more sophisticated synchronization. Since synchronization involves communication between all processes, systems should probably limit the amount of global synchronization they perform. Even in a minimalist scheme such as this, global synchronization requires a linearly increasing number of messages which could cause scalability problems.
Posted by: Todd Frederick | February 7, 2011 10:28 AM
Time, Clocks, and the Ordering of Events in a Distributed System
Summary
The authors present a partial ordering of events in a distributed system, and then extend it to an arbitrary total ordering. They then show how synchronised physical clocks could be used to order events in a natural way.
Problem
Running multiple processes concurrently can cause races or deadlocking if they access shared resources. It may also be desirable to avoid using a centralized locking mechanism. The authors also introduce their anomalous behaviour problem They feel users will be confused if requests to the system from different processes do not take effect in the same order in which they are sent.
Contributions
The authors introduce a partial ordering of events in a distributed system which is implemented using a counter for each process and adding timestamps to messages. The ordering preserves a sense of causality of messages; a message is always received after it is sent. The authors extend the partial ordering to a total ordering using an arbitrary ordering of the processes. Using this total ordering, they present an algorithm for distributed locking of resources. Finally they add physical clocks to each process, which are synchronized to within an error margin. These physical clocks are used to eliminate the anomalous behavior, that is they ensure that messages are interpreted in the correct order with regard to the physical time that they were sent.
Flaws
The locking algorithm is not really reasonable to implement, however this is a theoretical paper and it seems sufficient to show that some algorithm exists, even if it is not very reasonable. However, as the authors state, they assume that nodes cannot fail and that all messages are received. Failure is a major concern for distributed systems, so the fact that they did not address this issue at all makes me question the usefulness of their conclusions.
Discussion
The idea of the partial ordering, and perhaps the total ordering seems like it is useful in any case where it is important for distributed tasks to be performed in a certain order. Their use of counters seems like a generalization of sequence numbers in TCP, which is effective at ensuring the correct order of network messages. One concern might be the frequency at which the counter overflows since dealing with this becomes more difficult when there are more than two processes. However, using a 64 bit counter would probably provided enough time if counters were only incremented for messages.
The authors work on physical clocks to solve anomalous behavior does not seem very useful. Given today’s users’ familiarity with the web and distributed web applications, it seems an obvious and simple solution would be for user B to simply wait until his view of the system shows that A’s request has been received. At this point the partial ordering is enough to ensure that B’s request is processed after A’s request. Further, A’s request could be lost, at which point it would be fruitless to try and ensure correct ordering of the requests.
Posted by: Aaron Brown | February 7, 2011 05:03 PM
Summary
This paper presents a partial ordering for the “happened before” relation using logical clocks; this is then extended to a total ordering. Anomalous behavior that can be introduced by the total ordering is then reconciled with the use of physical clocks.
Problem
In any system, defining an ordering of events without a common view is difficult. Even if real clocks are present in the system, it is unlikely that the clocks will be sync. Thus, another mechanism is necessary to define a partial or total ordering on the system.
Contributions
Lamport introduces logical clocks, C_i, for each process P_i, that assign a number C_i(a) to event a in process i (then, all such C_i define a global set of clocks, C, where C(a) = C_i(a) for event a in process i). Having defined a “happened before” relation (->), he imposes the condition that if a->b, then C(a)
However, in a distributed system, total ordering is important. Events in the entire system are ordered by the time at which they occur, using an arbitrary total ordering of the processes (defined by the => relation) to break ties. Because of this arbitrary ordering, total orderings are not unique. An algorithm to grant resource allocation without a central scheduler can be defined using the total ordering of events.
This total ordering can introduce anomalous behavior. Given that relevant events can happen outside the system, a second “happened before” relation (->) is defined for the set of all system events plus these relevant external events. Without this new relation, we cannot guarantee that the system will observe the ->; relation. a->b suggests that a is ordered before b in all possible views. Physical clocks must be used to eliminate this anomalous behavior.
Flaws
The synchronization algorithm, a core proof-of-concept for the total ordering introduced, relies on a static system. Although Lamport does briefly state that failure is beyond the scope of the paper (and refers the reader to another paper), he doesn’t address the issue of new processes entering the system. Further, this algorithm relies on the assumption that messages (from one process to another) and received in the order they are sent. Although he suggests that this assumption is not essential to the algorithm, it is a strong assumption that is worth additional commentary. In general, this paper presents a very idealized system with little discussion of practical problems that might arise.
Relevance to Real Systems
Although this is an idealized paper that ignores or glosses over complex issues associated with actual, deployed systems, it presents a clear picture of how partial and total ordering can be accomplished in a distributed system. This ordering does not rely on an external synchronization mechanism or other complicated techniques. This work provides a good foundation that can be expanded to accommodate dynamic systems and other, more complex requirements.
Posted by: Emily Jacobson | February 7, 2011 06:39 PM
Summary:
This paper describes a means of creating partial and total orderings of processes in distributed systems, using both in purely logical clocks as well as physical ones. It uses these clocks to create an algorithm for resource allocation that avoids conflicts. In addition, it provides a theoretical basis for synchronizing physical clocks between different nodes in a distributed system.
Problem:
This paper addresses the need for a consistent timing mechanism between nodes in a distributed system in order to determine the ordering of events. Without this, it is impossible to resolve conflicts that arise because of synchronization problems, such as in-order resource sharing.
Contributions:
The paper defines and gives a very thorough analysis of the happens before relationship and the ordering it imposes. It uses this to create a system of logical clocks, for which it provides a simple set of rules for keeping them synchronized, thus introducing a total, rather than partial, ordering to the system. With this total ordering, one can have a definite sense of temporal relations between processes, even if they do not directly affect each other.
This total ordering allows synchronization between separate processes to be achieved fairly. As an example of this, the paper provides an example of a purely distributed algorithm for resource scheduling, which it claims can be generalized to apply to any set of problems requiring synchronization. It notes, however, that the algorithm fails to fully meet its requirements of strict temporal ordering if outside interference occurs (e.g., through transmission delay). Thus, its final contribution is an analysis of physical clocks, which can solve this problem, and how much agreement one can realistically expect between them.
Flaws:
As others have noted, the resource scheduling algorithm has a number of flaws that would likely arise in any actual implementation of it. Given the sheer number of messages that are exchanged in the execution of the protocol, it seems plausible that, if a given resource is in high demand with a large number of nodes in the network, the underlying network could be brought to a grinding halt. It’s also plausible that the algorithm would exhibit fairly high latency, given that all nodes need to reply before a given node can access a resource. In addition, while the theory about physical clocks is informative, it would be useful to have some practical information about what to do with it (rather than a reference to “the literature”), and the fundamental assumption that clocks cannot be set backwards seems shaky at best.
Relevance to real systems:
The fundamental ideas about clocks, keeping them updated, and using them to order events are sound and provide a useful framework when implementing distributed systems. However, without serious modification, the resource sharing algorithm seems impractical (it may be better to refer to [3]), and I’m not entirely convinced of the utility of the theorem about physical clocks.
Posted by: Chris Dragga | February 7, 2011 08:12 PM
Summary: Lamport presents a method for constructing a total ordering of events amongst processes, based on a partial ordering. The addition of tightly synchronized physical clocks provides a stricter ordering and avoids potentially anomalous behavior present in the total ordering.
Problem: The problem the paper seeks to address is the synchronization of events in a distributed system. Unlike a single threaded application, multiple actions can occur simultaneously in a distributed system. Coordinating the events to provide some desired property, e.g. mutual exclusion, requires a method for synchronizing the many processes executing in the system. We typically think of events occurring based on a physical notion of time, but distributed systems require strictly conforming clocks to synchronize based on physical time. Instead, we need a method to synchronize events based relative times. For example, providing a purchase confirmation to a user should not occur until after the sale has been recorded. Solving the problem is even more challenging because message passing, which can incur noticeable delay, is the only method for establishing an ordering of events between two or more processes in the distributed system.
Contributions: Lamport's first contribution is a distributed algorithm for establishing a total ordering of events amongst multiple processes. The foundation for the algorithm is the idea of a partial ordering: event a comes before event b if 1) a and b are events in the same process and a occurs before b or 2) a is the sending of message by one process and b is the receiving of the message by another. By adding a logical clock to each process, every event can be assigned a logical time of occurrence. Process occurring the same event are assigned increasing timestamps as they occur. A message sent between processes causes the receiving process to advance its clock at least beyond the sending time indicated in the message, which is based on the sending process' clock. If the time assigned to an event by a process' clock is before the time assigned to an event by another process' clock, then a total ordering of the events can be established. This total ordering can be used to serve a queue of requests, for example, requests for mutually exclusive access to a resource.
The second contribution is a stricter total ordering enabled by the addition of tightly synchronized physical clocks. Physical clocks avoid the issue where a request issued by one process may arrive first and be served first, even though the request was generated after another process generated a request. This can be important in the purchasing example identified above, where the first person to click the purchase button on his/her machine should receive the last concert ticket instead of the person who submits their order second but has a higher speed connection. Lamport provides specific bounds for how incorrectly a clock can keep time and how unsynchronized it can be without causing any loss of ordering.
Flaws: No major flaws exist in Lamport's algorithm. However, one item to consider is the number of message exchanges required to keep systems synchronized. Events which require tight synchronization may spend significant time exchanging methods instead of doing actual work. Therefore, developers need to balance the need for ordering events with the overhead of communication to still receive the benefits of executing tasks in a parallel, distributed fashion.
Applicability: Synchronizing events based on message exchange is important in real-world systems which rely on a sequence of events. This is certainly easy to system in the purchasing system example used in this review. However, synchronization is also important in some not so obvious tasks. For example, a search on Google does not require the search to happen in a synchronized fashion, but merging results from multiple clusters to provide the results to the user requires some form of synchronization. File systems are another common example of the need for synchronization, and file systems which rely heavily on physical timestamps can often experience problems when tight time synchronization is not present.
Posted by: Aaron Gember | February 7, 2011 08:26 PM
Summary
The paper describes a distributed algorithm to determine total order of events, happening in a distributed system, by extending the idea of partial ordering of events. The algorithm uses message passing to synchronize logical clocks on machines. It is further extended to synchronize physical clocks on machines. Through an example and its State Machine generalization, it is shown how desired synchronization can be implemented in a distributed multiprocess system.
Problem
For reasons of synchronization, determining total order of events is necessary in distributed systems of processes which do not share a common clock. Since synchronization of physical clocks can not be achieved beyond a certain extent and is also not always possible, deriving a consistent global total order of events is hard to achieve. The paper introduces logical clocks for the aforementioned purpose.
Contribution
Lamport's use of logical clock, which are unrelated to physical clock on a machine, to order events in the system provides a convenient and simplified approach to solve various synchronization problems. A process can join the system of logical clocks at any point of time since there is no single inception point at which all the processes need to be present to reset their logical clocks. A process can start receiving messages from other processes whenever it becomes part of system and can then assign timestamps, higher than the latest received, to its events. Also, the timer function can be implemented in lock-free atomic fashion thus providing fine granularity of timestamp.
The State Machine generalization of resource allocation problem provides a universal solution to any synchronization problem in distributed environment. Lamport also discusses how physical clock synchronization can be achieved in distributed system and derives a bound on how far clocks can be allowed to go out of synchronization with regard to minimum message transmission delay in the system to avoid any anomalous behavior.
Flaws
The logical clock mechanism to totally order the events is prone to errors. Messages are only way to know the ordering of events across multiple machines. In their absence or delay, events are either assumed to be occurring concurrently or get ordered inappropriately. For example, when a process Pi sends message to Pj with timestamp x, until Pj receives this message, events occurring at Pj are assigned timestamp less than x even though they happen at later interval than Pi's event. Thus logical clocks do not obey Strong Clock Condition.
The system of properly synchronized physical clocks requires minimum message transmission delay time (mu) to be known precisely so that a message receiving process can set its clock time correctly. If very small 'mu' value is used then the clocks need to be absolutely tightly synchronized to each other, large value of 'mu' can lead to anomalous behaviors. Thus precise knowledge of 'mu' places constrain on how machines in distributed system are allowed to be connected to each other.
The resource allocation algorithm and its State Machine generalization for synchronization implementation function only with active participation of all involved processes and is not fault tolerant.
The distributed algorithms discussed in the paper require each process to send message to every other process to notify other of its current logical/physical time. This is surely not scalable.
Application
The resource allocation problem discussed in paper is prevalent in distributed systems. It can also be used to build distributed lock protocol. The synchronization techniques discussed in the paper can be conveniently used in a system where the time interval between any two events is coarse.
Posted by: Sandeep Dhoot | February 7, 2011 11:27 PM
Summary:
This paper presents an algorithm for constructing a total ordering of events in a multiprocess system from a partial ordering of events derived from the sequential execution of a single process and the messages exchanged between processes. This algorithm is applied to the clock synchronization problem to establish an upper bound on the clock skew among a set of processes, as a function of certain parameters of the system.
Problem Description:
The problem addressed by this paper is to develop a means for establishing the order of events in a distributed system without utilizing a central authority that maintains system time. In a multiprocess system, time doesn’t provide a guaranteed ordering of events because each process’ perspective of time can differ. The problem boils down to using communication between the processes to maintain clock synchronization. This problem is important because application-level synchronization relies on a view of time that is consistent throughout the system
Contributions Summary:
This paper contributes two main concrete results. First, the algorithm for constructing a total order of events is as follows. When a process executes an event, the process’ clock is incremented. When a process sends a message, it adds a timestamp of its own clock to the message. When another process receives the message, it must adjust its local clock such that it is greater than the timestamp in the message. This establishes a “happens before” relationship between the send-receive event pair, and allow a process to locate its events in the system-global perspective of time.
However, systems don’t usually operate in isolation and interact with the external world. To account for this interaction, the author extends the two rules for logical system-level clocks to physical clocks. The extension modifies the first rule to use a continuous clock instead of discrete clock that is incremented on each event. The second rule is modified to update a clock on message reception to the maximum of the current local clock (modulo some jitter) and the timestamp in the message plus some minimum delay. The result is that to maintain a bounded amount of clock skew, messages need to be exchanged at some regular interval and the maximum delay for messages needs to be bounded.
Shortcomings:
I think this paper presents some theoretically sound principles but it doesn’t address how to define values for a few specific system parameters. The first is that the minimum delay for a system needs to be known. This quantity could be difficult to define in practice. The second parameter is the total order among processes used to break ties. While this ordering can be arbitrary, it needs to be consistent across processes. The author doesn’t address this problem in this paper, but I think he may address this issue in another paper (ala distributed agreement).
Application to real systems:
I believe that clock synchronization still remains an important problem today. Many large-scale services are being transitioned to virtual machine-based cloud platforms. These virtual machines tend to have more finicky clocks than their bare metal counter-parts and as such, applications need to be more aware of clock synchronization problems.
Posted by: Dan McNulty | February 8, 2011 12:29 AM
Summary:
This paper discusses the ordering of events in a distributed system, where concurrency of execution and messaging delays make temporal reasoning difficult. A simple algorithm is presented that produces a total ordering of events in the distributed system. Anomalies, where there is an observable difference between the logical ordering of events in the system and their physical ordering in space-time, are discussed, and a solution employing synchronized local physical clocks to prevent anomalies is given.
Problem:
In a distributed system, individual processes can be described as a totally ordered sequence of local events. Additionally, messages between processes must be recieved after they are sent. These two facts produce a partial ordering of all the events in the system, but still allow for events to be "simultaneous" in that one is not conclusively ordered before the other. Reasoning temporally about the state of an entire distributed system requires a total ordering of events. Some practical problems in such a system can only be solved if a total ordering is defined (The author gives an example of mutual exclusion with a queue of waiting processes, where the mutex is granted to waiters in the order in which they requested it. Note that, if it is acceptable to service requests in the order they are recieved by a central scheduler process, then a simpler solution without total ordering is possible.).
After presenting a solution for total ordering (discussed below) the authors consider the problem of anomalies, where observation and communication by users outside of the channels of the distributed system can lead to the algorithmically produced total ordering appearing different from the observed physical ordering of events. Although the paper doesn't mention it specifically, such anomalies can lead to particularly vexing unexpected behavior in, for example, transaction processing systems.
Contributions:
To provide total ordering of events, the author introduces the "Clock Condition", which states that for any events a and b, if a occurs before b, then C⟨a⟩
A practical application of such a total ordering is shown in the form of a distributed mutex with a waiting queue.
The author notes that there are cases where there is more than one possible total ordering for a given partial ordering of events, and that the presented algorithm picks one somewhat arbitrarily. As mentioned before, this can lead to anomalies in the face of out-of-band observation and communication. A strong clock condition is introduced, which states that in the system including relevant external events, if a precedes b, then C⟨a⟩
A theorem is proven which states that in a strongly connected graph of processes with diameter d, where at least once in every time interval τ a timestamped message with unpredictable delay ξ is sent over every arc, we have ε ≈ d * (2 * κ * τ + ξ) for all times t ≥ t0 + τ * d.
Flaws:
The author notes that both solutions presented depend on all nodes being up and all communication links being reliable. In a real distributed system, these assumptions are not reasonable, as processes and links fail often. It is noted that a failure, logically, looks the same as an indefinite delay in message transmission. In the physical clock system (particularly in the formulation from the theorem that requires at least one message sent within every τ-interval), the failure to recieve a "heartbeat" message within an interval (1 + κ) * τ could be taken as an indication of a failure, and the system could pause, notify users, or attempt some sort of recovery as necessary to prevent desynchronization or anomalies.
Some of the reasoning in the paper assumes that all links are reliable and that messages arrive in the order they are sent. Again, these are not reasonable assumptions in a real system. However, there do exist protocols such as TCP which ensure ordered, reliable message delivery over spotty links, so the use of such protocols in a practical system should allow the logic of the paper to still apply.
The reasoning on regarding physical clocks makes the assumption of a Newtonian space-time (or, equivalently, a single shared relativistic frame of reference), specifically that physical time progresses at the same rate for all processes. In the real universe, the effects of special relativity (the relative velocity of processes) and general relativity (the curvature of space which the processes inhabit) on the rate of time progression must be taken into account. This may seem to be nit-picking, but there are, in fact, distributed systems with physical clocks whose correct operation must take this into account (GPS is a prime example: each GPS satellite broadcasts time from an extremely precise onboard clock, and relativistic corrections are necessary for the system to work).
Finally, certain assumptions about observability of events and the messaging delay μ in the physical clocks case are couched in a particular rule of special relativity: that information cannot be propagated faster than the speed of light. Since the paper's publication, however, research into quantum entanglement suggests that "spooky action at a distance" wherein entangled particles correct their spin to match each other instantaneously, regardless of distance, is possible. One could imagine a system where network links are pairs of entangled particles with true 0-latency communication. That would seem to force μ to 0, at least in an ideal system. However, we also know from quantum mechanics that it is impossible to observe time in intervals finer than the Planck time (approx. 5.49E-44 s), so we have the Planck time as the ideal minimum of μ. Practically, however, it is impossible to construct a clock whose ε and κ parameters satisfy ε / (1 - κ) ≤ μ when μ is the Planck time, since a time skew ε less than the Planck time is not physically measurable. Of course, a practical system would not be able to "read" the change in an entangled particle's spin instantly, even if it changes instantly. Depending on how fast the measurements can be conducted and how precise a clock can be constructed, it may be necessary to artificially delay messages to mask the quantum uncertainty about the timing of messages.
Applicability:
Although the paper skews towards theory, one can see many applications where a total ordering of events is helpful in a distributed system. The anomaly-free version with physical clocks seems especially useful in a distributed transaction system where each transaction must be ordered and maintain certain invariants about the overall state of the system. Modifications to the paper's reasoning to account for process and link failure, as well as reordering or loss of messages, seem possible, and would increase the paper's utility in thinking about real-world distributed systems.
Posted by: Craig Chasseur | February 8, 2011 12:32 AM
Summary:
This paper presents theory foundations for how to extend the partial ordering
to total ordering for a distributed system. Several synchronization algorithms
are also proposed based on logical clocks and physical clocks. These mechanisms
can be used as guide to implement a real distributed system.
Problem:
In a distributed system, total order information is usually hard to get for all the nodes. But total order is very important to make the system correct. How to extend the partial order to consistent global total order and how to synchronize these orderings are the main problems discussed in this paper.
Contributions:
1. Formally define the "happened-before" relation, which is used as the theory base for discussing the ordering in distributed system.
2. Extend the partial ordering to total ordering with the logical clocks. Show how to use this total ordering to solve a simple mutual exclusion problem.
3. To eliminate the anomalous behavior when extern events exist between processes, physical clocks are proposed with strong clock condition.
Flaws:
1. Strong assumption about lossless communication. This violates the end-to-end
argument about the underlying failures. It is hard for us to figure out how this mechanism will work in real distributed system with frequent failures about messages, processes, network.
2. Also assume no security issues, such as malicious processes, fake messages, etc.
There may be lots of such security issues to deal with in real systems. How these will affect the high level design is not discussed in this paper.
3. The synchronization algorithm is hard to scale, because each process needs to
simulate the execution of the state machine using the commands of all the processes.
With current scale of Google, this method definitely will not work, due to hundreds of thousands of machines or processes in a single distributed system.
Applicability:
This paper was awarded as PODC Influential Paper Award in 2000, for its contribution
on logical clocks. It proposed a generic theory foundation for how to build a synchronized distributed system with partial ordering information, which is still very useful nowadays, such as online shopping, flight booking, banking, etc. However, lots of practical issues are also needed to be considered to implement a real working distributed system.
Posted by: Lanyue Lu | February 8, 2011 12:35 AM
Summary
This paper make a conception of one event happening before another in a distributed system. And this paper also propose an algorithm which can be used to realize synchronization among processes in a distributed system.
Problem
authors want to find an algorithm which can help distributed system grant the resource to a process which satisfies three conditions:
1) a process which it has been granted the resource must release it before it can be granted to another process;
2) different requests for the resource must be granted in the order in which they are made;
3) if every process which is granted the resource eventually release it, then every request is eventually granted.
Contributions
1) author make a conception of one event happening before another, based on order in the same process and message sending among different process
2) design an algorithm which grant resources among different processes and satisfy three conditions
Flaws in the paper
1) authors assume that each process can send messages among each other, and think that message sending time is not an important factor in algorithm design. If message sending is very time consuming, “Pi has received a message from every other process timestamped later than Tm” is quite unreasonable.
2) authors assume that there is only one resource in distributed system. But in real situation, there may be more than one resource. And the “distance” between each resource and process needed also to be considered when distributing resources
Relevance:
The algorithm proposed in this paper can still be used to realize some synchronization primitive, like lock.
make a order of all events in a distributed system is a good way to check whether design logic is realized correctly.
Posted by: Linhai Song | February 8, 2011 01:17 AM
Summary
This paper describes how the “happens before” relationship between events in separate processes forms a partial ordering of all events. Algorithms which use logical and physical clocks are shown to be able to produce a valid total ordering of events. An example system for fairly allocating a shared resource is described in terms of the algorithms presented.
Problem
The problem that the author was trying to solve was extremely general: how to synchronize events in a distributed system. The basic approach is to leverage a total ordering produced by some form of clock. This is non-trivial because the notion of time may be inconsistent among different processes.
Contributions
First, the author presents an algorithm which can provide a total ordering based upon logical clocks. Essentially, every process has a local clock, and this clock is updated on any event. Messages contain a timestamp of the sender’s time, and a receiver must take the maximum of the remote send time and current local time. This enforces the property that dependent events occur in-order, according to the clock values. The total ordering can be achieved by arbitrating between events that appear to execute at the same logical time. The mechanism of including message timestamps, and forward prorogating time-shifts between processes is unique, and the fundamental algorithmic contribution of the paper.
Also, the paper describes how the algorithm can be extended to handle ordering constraints which aren’t expressed through explicit messages known to the system. The idea is to use coordinated physical clocks, with some properties regarding the rate of time drift and total inconsistency. It is shown that a specialization of the previous algorithm, which considers time as continuous and accounts for message delay, can satisfy the strong clock condition. In essence, this means that the total ordering can be determined by looking at the timestamp of an event. In addition, a strong theoretical basis is developed for quantifying the relationship between message transmission time, clock rate similarity, and overall time drift. Given some knowledge about process interaction patterns and the properties of clocks in the system, there is an upper bound on the total amount of drift in the system.
Limitations
The author appears to be well aware of the limitations of the research. This includes the fact that there’s no prescribed way of handling node failures or dynamically reconfiguring the system. Also, the derived synchronization protocol uses a significant amount of global traffic. However, this paper is less about how to implement a system, and more about the fundamental limitations and guarantees that a particular mechanism can provide.
It’s also worth emphasizing that the topology of the system and messages sent are going to significantly affect the maximum difference in time values. If messages are rarely sent, or are sent over a weakly connected network (ex. line), then time-drift will be propagated much slower.
Applicability
Naturally, the ability to synchronize actions in distributed systems will always be important. Even if a direct application of the synchronization algorithms isn’t feasible, the ideas of forward propagating time drift is a useful mechanism which can provide accurate distributed clocks. Also, this paper reminds us that even though synchronization is a powerful concept, useful for implementing distributed algorithms, its implementation can be very expensive.
Posted by: Tony Nowatzki | February 8, 2011 01:17 AM
Summary
This paper discusses the problem of ordering events in a distributed system, both partially and totally. It provides a technique for coming up with a total ordering of events in a system, gives an example of how that technique can be used to solve a synchronization problem, and then discusses using that technique to synchronize clocks.
Problem
In distributed systems, it can be difficult to say precisely the order in which events occur. Since individual nodes do not have a global view of the system, they will sometimes disagree on whether or not a certain event happened before another. For this reason, it is useful to come up with some global view that describes precisely when events occurred (relative to other events).
Contributions
This paper has several contributions. Firstly, it presents a way of representing a partial ordering of events in a system, through use of logical clocks. Rather than using some physical representation of time, logical clocks simply assign numbers to events, with higher numbers being assigned to events that happen after lower-numbered events. It then gives a way of going from this partial ordering to a total ordering, which involves ordering the events based on the partial ordering and then arbitrarily ordering “concurrent” events. It demonstrates the usefulness of this total ordering by using it to solve a common synchronization problem. Finally, an algorithm is given for synchronizing physical clocks and provide some measurements of the limits to which clocks can become out of synch.
Flaws
There were several things that bothered me about this paper. First of all, I thought they glossed over the very important question of dealing with partial failure. The way the system is currently described, if a process wants to access a resource, it needs to have received a message from every other process time stamped later than the time stamp of its request for that resource. This means that if a single computer fails or message is lost due to network failure, the entire system will halt. Though this paper seemed more interested in the theoretical aspects of what is possible and less interested in the actual implementation and real world use of their ideas, this is still far too large of a problem to completely ignore.
Another problem I had with the paper was their insistence that their system contain knowledge about events that happen external to it. To me, this seems like an unrealistic, unreasonable, and unnecessary goal. A system should not be expected to make ordering decisions based on information it cannot possibly know. If two users have some arbitrary ordering in which they want their work to happen, it is up to them to ensure that that happens. Trying to provide a way for the system to do that adds unnecessary complexity.
Application
I think the paper provides an interesting way of thinking about partial and total orderings, things which are important to understand when dealing with distributed systems. However, the techniques, as described in the paper, have little application to real world systems today. In very large networks, actions like broadcasting a message to all other nodes and waiting for a response from all of them is simply infeasible – it will be slow (given that the you will always have to wait for the slowest node) and it will consume large amounts of bandwidth.
Posted by: Ben Farley | February 8, 2011 01:27 AM
Summary:
The paper presents a distributed algorithm which extends a partial ordering defined by "happened before" relation to a consistent total ordering of events in the system by introducing logical clocks. Then the author illustrates how to use total ordering of events to solve mutual exclusion synchronized problem. Finally, a physical-clock-synchronized algorithm is suggested to avoid an unexpected anomalous behavior.
Problems:
Assuming that a distributed system consists of multiple processes and each process consists of a sequence of events, there are three main problems to solve:
1. Because processes have different physical clocks, it is impossible to model "happened before" relation in term of physical clock time. We need another way to model it correctly.
2. How to build a system sharing resources correctly? For example, multiple processes may use the same resource by sending request and release messages, we need a mechanism to guarantee the resource is granted in the order in which the request is made.
3. Merely synchronizing logical clocks may cause unexpected behavior because it doesn't consider the relationship among physical clocks of processes. As a result, the actual order of events may be different from the observed one from users' perspective.
Contributions:
1. Introduces a partial ordering "->" that defines the "happened before" relationship between events in the system.
(For two events a and b, a->b if a comes before b in the same process or a and b are the sending event and receiving event of a message respectively or there exists another event c, a->c and c->b)
2. A logical clock is assigned to each event by an algorithm so that the clock condition is satisfied. In the clock condition, if a->b, then C(a) IR1. clock increments between any two successive events.
IR2. When a process P1 sends a message to P2, P1 will include logical clock C1 of the sending event in the message. When P2 receives the message, P2 will set logical clock C2 of the receiving event to be larger than C1.
3. Introduces a total ordering "=>" based on "->": for two events a and b, a=>b if and only if either Ci(a) 4. Discusses the "anomalous behavior" problem and gives a physical-clock-synchronizing algorithm to handle it. Some theorem are shown to analyze how closely the physical clocks can be synchronized.
Flaws:
In the algorithm used to solve mutual exclusion problem, it is not practical for a system with many nodes because whenever a process requests or releases resources, it has to send a message to every other process. These messages would flood the network and even affect other useful messages. In addition, if such message is lost or dropped when the request queue is full, conflicts may happen. For example, process P1 requests the resource and sends a message with timestamp 1 to P2 while the message is lost. After a while, P1 is granted the resource. Then P2 requests the resource. Because P2 doesn't see P1's message in the queue, P2 may use the resource while P1 is using it, which causes conflict.
Application:
The partial order introduced by the paper builds the foundation for vector clock algorithm, which can not only create a partial order of events but also detect causality violations. The idea of vector clock is also used in the Dynamo to capture causality between different versions of the same object.
Posted by: Weiyan Wang | February 8, 2011 01:50 AM
Summary:
This seminal paper in Distributed systems introduces us to the notion of partial ordering of events in a multi process system. Then it extends the idea to show an algorithm that uses a system of logical clocks that can achieve total order. Total order is useful for solving many problems in distributed systems like synchronization. Use of logical clocks can introduce anomalous behavior which is tied intrinsically to the idea of logical clocks and can be solved by use of physical clocks with certain margin of error in their accuracy.
Contributions:
The paper shows how to extend the idea of logical clocks to define ordering of events in a system without use of physical clocks. This is possible by having a clock condition which states that that if events were represented by a number than the number associated with a later event has to be greater than the former event. The clocking and time stamping system is basically developed keeping this in mind and paper shows how to order events with using only logical clock. But just using logical clocks as the indicators can only guarantee partial ordering. Paper then shows us how to ensure total ordering by using an algorithm that utilizes a process’s knowledge of things happening with all the other processes in the system. This can lead to total ordering of events which can easily solve synchronization problems. Logical clock system has limitations when heterogeneous events are taken into the picture. This problem can be solved using physical clocks since now we can time events absolutely rather than relatively.
Discussion:
The idea of ordering has its own applicability levels ranging from low level things like cache coherence and atomic instructions to high level things like locks and message passing. It is nice to see how the ideas presented in this paper have been applied over the period of time to these different levels in systems. Firstly I want to point out that this paper clearly shows that partial ordering is easier to achieve than total ordering. What I have seen in different topics like computer architecture and operating systems is that people tend to prefer partial ordering over total ordering for speed and simplicity. Correct me if I am wrong about this. Think about relaxed consistency versus strict consistency in computer architecture and mesa versus hoare semantics in concurrency techniques used in operating systems. Even in distributed system I guess we will see that people tend to sacrifices correctness over short periods of incorrectness for sake of simplicity in design.
Flaws:
Paper did a good job at covering all the loop holes with assumptions about things like in-order delivery of messages, ability to send messages to all processes, etc. I don’t see any flaws except that the algorithm described is too complicated to implement and moreover can be really slow compared to a relaxed implementation.
Applicability:
Like the concluding paragraph of the paper (before the dreadful appendix), this idea is applicable to any system that behaves like a multi process system; from chip multiprocessors to multi threaded programs.
It is so rare to read a paper in systems that uses phrases like ‘one of the mysteries of the universe’ and ‘like special relativity’. I remember the first lecture when professor compared distributed systems with the universe in general. Sometimes similarities like these in different topics can make things SO interesting
Posted by: Paras Doshi | February 8, 2011 04:56 AM