Time, Clocks, and the Ordering of Events in a Distributed System
L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System, Communications of the ACM, July 1978, pages 558-564.
Review due for this or Snapshots Thursday 2/16.
« Epidemic algorithms for replicated database maintenance | Main | Distributed snapshots: determining global states of distributed systems »
L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System, Communications of the ACM, July 1978, pages 558-564.
Review due for this or Snapshots Thursday 2/16.
Comments
This paper addresses, in a very generalized, abstract sense, the problem of
how to meaningfully determine a global temporal ordering of events that occur
in distributed systems. It first presents a method for arriving at a partial
ordering in logical time, and then builds upon this method to arrive at a
total ordering in logical time. Lamport then proceeds to explain how it can
be further adapted to track global physical time as well, and gives formal
proof of the accuracy bounds on the physical time kept by such an algorithm
(since there is necessarily some degree of "slop" in such a system).
To first establish a logical-time partial ordering, Lamport introduces a local
logical clock for each process such that each event that occurs within a
process happens at a particular time, and the individual times at which any
sequence of events occur are monotonically increasing. Additionally, an
inter-process restriction on event times is imposed: the transmission of a
message from one process must happen at a global time strictly less than
(before) the global time at which the destination process receives it (with
the receiver adjusting its clock as necessary relative to a timestamp from the
sender included in the message). This satisfies the "clock condition", that a
"happened before" relation between two events implies a "less than" relation
between the global times of the same two events.
This provides a partial ordering, i.e. it leaves some events with no
unambiguous ordering, which must thus be considered concurrent. It is however
a fairly straightforward process to extend this algorithm to provide a total
ordering and thus decide on a strictly linear sequence of events: simply order
all events by logical time and then by any total ordering of processes within
any set of events with the same logical time. By incorporating some knowledge
of physical parameters such as the minimum time to transmit an inter-process
message and the relative accuracy of the running speed of a given local
physical clock, the paper then shows how to determine a global physical
timeline of events to within a degree of accuracy sufficient to avoid timing
anomalies that might arise as a result of causal relationships between events
due to "out-of-band" messages.
The only apparent weaknesses with the algorithms presented in this paper are
either explicitly discussed as such (such as the potential for timing
anomalies in the logical-time total ordering), or mentioned implicitly as
assumptions about the nature of the system model used to present the
algorithms (such as the assumption of a strongly-connected graph in the
physical-time discussion, which would not hold, for example, in the case of a
network partition). Lamport acknowledges in the distributed-mutual-exclusion
example that it does not make any attempt to handle failures, but the more
general algorithm presented for partial ordering in logical time is
nevertheless still applicable in real-world failure-prone systems. For
example, it forms the fundamental basis of the vector clock scheme in Amazon's
Dynamo system to deal with concurrent writes to a single object producing a
DAG of object versions, in which the "parent-of" version-heredity relation
directly mirrors the "happened-before" relation Lamport uses in this paper.
Posted by: Zev Weiss | February 16, 2012 04:54 AM
Summary:
This paper presents a way to implement a distributed system by totally ordering events with the help of logical clocks and the use of timestamps. It also checks possible anomalous behavior that is produced by the total ordering and essentially violates causality. Then it describes how to synchronize logical clocks and by this way prevent the anomalous behavior. It also studies what kind of synchronization could be provided for physical clocks at last.
Problem:
If we want to develop distributed algorithms and have all participants come to the same conclusion, it is crucial to capture the causality between events or require all participants see inputs in same order. However, there is no built-in physical time in distributed systems and it’s only possible to realize an approximation of it. The problem here is how can we decide the events order, or how can we know when an event precedes another in a distributed system.
Contributions:
+ The paper shows how to use timestamps to provide a total ordering of events that is consistent with the causal order. Lamport first introduces “happening before” relation which defines a deterministic partial ordering of the events in a distributed system. With the help of a collection of logical clocks which satisfy Clock Condition, we could then use the timestamp to extend the partial ordering to a somewhat arbitrary total ordering.
+ It demonstrates how to implement a distributed system with an algorithm for totally ordering events. We could think of a distributed system as a particular sequential state machine which is implemented with a network of processors. With the ability to totally order the input events, we could implement a distributed state machine and, in turn, implement any distributed system. As an illustration, the distributed mutual exclusion example is great. The goal is to make all nodes agree to the granting of the lock by sending out time stamped messages. Because the communication is lossless and ordered, we know that every node saw the request. Further, because they responded with acknowledgement with a bigger timestamp, we know that they have accepted our claim to the resource.
Flaws:
- If there is any flaw I have to say about this famous paper, it would probably be the assumption that communication channels are lossless and ordered. Although today we could use TCP to meet this requirement, I just wonder how harder it would be when relaxing this assumption.
Applicability:
- In current mission critical environments, people often use hot backup in case of failure. Every host in hot backup is implemented as a state machine which accepts input events and output the result. The key for each host to make the same decisions is they must make a consensus on the ordering of input events. I think this paper sets a good example for this problem.
- The synchronization and mutual exclusion algorithms are readily applicable to today’s multi-core environment, e.g., multi-cores share the same memory.
Posted by: Yizheng Chen | February 16, 2012 04:51 AM
This article is about how to synchronize clocks in a distributed system to get a total ordering of events that occur in the system. The author opens the article up by discussing how precise ordering could not be obtained in a distributed system without use of synchronized real clocks. The only ordering available in this type of system is a partial ordering where you only know that events happened before a certain event (not the order in which they occurred). The general way the algorithm for obtaining a rough total ordering they propose work's is by broadcasting requests for resources. This is where a single process sends a request (with a timestamp) to the process holding the resource requested and to all other processes in the system. When the process finishes using the resource the process sends a release resource message to all processes to signal that it is done. This synchronization is done on the process level and requires cooperation of all processes in the system to function correctly.
The method discussed in this paper is interesting. Its a interesting solution to a rather complex problem in distributed systems but it does have some drawbacks. This specific algorithm however seems like it might have some scalability issues when timing events in very large systems. The resource request messages being sent to a very large number of host's can kill performance of the system. It seems that if you would have to send messages to every other process in the network to obtain an absolute ordering of events would be impractical. The other issue the author brought up was the fact that failure recovery was not discussed in this paper. Im not sure how you would do this without some very expensive synchronization operation between all hosts on a failure of one node.
If this is the best method for obtaining total ordering of events then it seems like it may not be practical to do in a distributed systems environment without the use of some specialized hardware. There may be systems available which this order could be obtained practically by not using this algorithm. These systems would be systems where the tick clock counter of every processor is synchronized by an external source. I believe some very high end super computing systems such Blue Gene/P may use an external CPU clock which may allow for total order synchronization. However even in these systems if the number of events is large, ordering them may become impractical in it of itself (sorting time).
Posted by: Benjamin Welton | February 16, 2012 03:39 AM
1. one or two sentence summary of the paper
The paper mainly discusses the methods to determine/define the partial/total ordering of events based on logical/physical clock and how to use the ordering to solve synchronization problems in a distributed system. Here, a distributed system consists of a collection of distinct processes which communicate with one another by exchanging messages, and each process consists of a sequence of events.
2. A description of the problem they were trying to solve (and how the problem solved)
The paper first introduces the concept of one event happening before another in a distributed system and then defines a partial ordering of the event based on the concept. Partial ordering just reflects the causal relationship of events for the same or different processes, but it cannot describe the ordering of events for all processes in terms of time based on certain clock. In order to solve the problem, the paper introduces the concept of logical clock (a function assigning a number to an event) and clock condition. The logical clock is a discrete clock and it is different from physical clock because it is independent from real time. Based on the logical clock, the paper defines ordering the events totally which extends partial ordering and depends on both the system of events as well as the system of clocks. Then a distributed algorithm applying the total ordering of events is proposed to solve a synchronization (mutual exclusion) problem. The basic idea of this distributed algorithm is that processes send request/release message with timestamp to all other processes and each process itself knows whether it is granted the resource or it should release the resource based on the totally ordered timestamp. So far, since the ordering is based on logical clock which is independent to physical time in real world, certain anomalous behaviors may occur when applying the logical-based total ordering of events to get rid of some synchronization problem in real world. A natural way to solve this problem is to extend the ordering of events from based on logical clock to physical clocks. Physical clocks is a continuous clock and the extension is a little complex, but the main ideas are the clock function should run at approximately (the variation k should be much less than 1) the correct rate as a true physical clock and the variation e of clocks function value at any time for two distinct function should be smaller enough. The paper proves that when the two variations (k and e) meet certain inequality, then the anomalous behavior will not occur.
3. A summary of the contributions of the paper
First, the paper conveys an important idea which is useful in understanding any distributed system, that is, in a distributed system, the order in which events occur is only a partial ordering. Second, it extends the concept of partial ordering to total ordering of events based on both logical clock and physical clock. Third, it explains by example (solving a mutual exclusion problem) why knowing the total order of events is useful for us in implementing a distributed system. In addition, I am not sure whether using “space-time diagram” to illustrate concepts is a novel method for this paper, but it is really helpful to understand the concept of partial ordering and logical clock.
4. The one or two largest flaws in the paper.
There are certain limitations of the assumptions of the distributed algorithm proposed in the paper for solving the mutual exclusion problem. The algorithm is based on the assumption that a process can send message directly to every other process. In a large distributed system, the assumption may not hold. Suppose one process runs on one machine and another process on another machine and these two machines may be far from each other and cannot communicate directly. Another assumption is that any processes can receive messages with timestamps from all other processes. The assumption is not easy to be true since the failure of machines and disconnection of network links may occur frequently in real world.
5. A discussion of how the ideas in the paper are applicable to real systems.
Being able to totally order the events can be very useful in implementing a distributed system, especially in solving synchronization problems. Since the total ordering of events depends on the choice of clock of system, we can choose certain clock to meet the system’s users’ requirements. For real system, typically, a synchronized physical clock is preferred.
Posted by: Junyan Chen | February 16, 2012 02:59 AM
This paper discusses the partial ordering defined by the “happening before” relation, and presents a distributed algorithm to extend it to a consistent total ordering for all the events by introducing clock systems. The algorithm is used to solve synchronized problems in this paper.
The main problem that the paper wants to solve is the unclear understanding on the ordering of events in a distributed system. Because sometimes it’s impossible to say one of two events occurred first, “happening before” is a partial ordering of events in the system. Many problems in distributed system cannot be solved based on partial ordering. Understanding the total ordering of all the events can provide a useful mechanism for implementing a distributed system.
This paper introduces a distributed algorithm for extending the partial ordering to a consistent total ordering of all the events. The algorithm is based on clock systems. By introducing the logical clocks, we are able to totally order the events. Anomalous behavior is also discussed. To deal with the anomalous behavior, Strong Clock Condition and the solution using physical clocks to eliminate anomalous behavior is also discussed.
The flaw I think in this paper is the discussion of the problem of failure, which makes the paper away from practice. Although it is a paper based on a high level of abstraction of the relations of events, it is still necessary to discuss the topic with real problem. This paper mentions once on the problem of failure when discussing state machine. Although the author believes the topic is beyond the scope of this paper, it is still necessary to carefully consider the failure situation. The whole system can be halted if one process fails. Solution to failures might includes adding timeout state and relative commands to make other parts of system works or recovery. But it is interesting to think about whether it would affect the model presented in this paper.
This idea discussed in this paper seems to be a high level abstraction of events in distributed systems. Nowadays when people design distributed systems, people think more about reliability, scalability, and availability. It seems that although this paper presents an interesting and fundamental theory of distributed system, it is still doubtful that whether this theory is useful in today’s system design.
Posted by: Xiaoyang Gao | February 16, 2012 02:33 AM
This paper presents a method for synchronizing timing between distribute systems. Several methods are used for this, which could use a hardware clock or be entirely independent of it while having different lengths of clock ticks.
It is generally hard to verify which of two events was first when they occurred on separate computers. Hardware clocks vary computer to computer and even if they were successfully matched up at a certain point they would still drift apart later on from variations in the clocks. Distributed systems need a way to see which event came first to give priority to certain events in a deterministic manner. A well defined timing system can accomplish this goal.
This paper presents a graph-like ordering for timing events that uses messages between different systems to synchronize clocks between them at the time a message is received. These points of communication are generally the place where time synchronization matters most and at the same time provides an easy method for keeping it in sync by tying the timing information to already incoming messages. The paper discusses how flexibility in choosing the timing interval can be used, such as the possibility of using subprograms or single instructions for an interval. There are certain places where the ordering between events matter and many others where it is not that important, so a systems wide view can determine which events require incrementing the clock. If an error bound can be determined on a physical clock, this bound can be used to approximate how much out of sync the clock may have gotten after a certain interval. If the possible error falls much below the total size of the clock tick being used by the distributed system.
One potential problem that might come up with these methods is when the system becomes especially large. As more computers are connected, the number of closely occurring events will increase and the overhead with keeping them all sorted will grow as will the communication needed to make sure timing stays consistent.
Most distributed systems have a sense of time when dealing with outside communication, so these ideas can be used in most any system dealing with time. They will be especially useful in a design that relies on locking or similar methods where certain events can occur in one order and not another. These timing methods can be used to determine which is the correct order in a less arbitrary way. This would be more important in a system requiring strong consistency. The paper also mentions that these techniques could be used for some communication between multiple cores on a single computer.
Posted by: Daniel Crowell | February 16, 2012 02:20 AM
The paper talks about how to order a set of events happening in a distributed
system in which there is no global consensus on time. The events happening can
be events on a single node or a message or communication sent from one node to
another. Some distributed algorithm need to have a clear view of the ordering
of such events to make some decisions and function efficiently and the paper
talks about deriving a partial ordering relation called "happened before".
They further extend this to provide a total order using the logical clocks.
This however has some anomalous behaviour if there are external communications
happening that trigger some event. To tackle this, the paper proposes a way of
using physical clocks and maintaining them in synchronization to achieve a
total ordering of events. The paper provides some good theoretical arguments
on the amount of tolerance in the synchronization based on the connectivity
between the nodes, the delay in communication and the frequency of messages
sent across.
The partial ordering a "happened before" b is defined if a and b are events
in the same node and a occurs before b sequentially or a is sending of a
message and b is the receipt of the message at the recipient's end. These are
intuitive and fundamental and Lamport applies logical clocks to these two
definitions to derive a total order. Logical clocks in same node tick between
two events and logical time at which a message is received will always be
higher than the logical time at which it was sent from the sender. Lamport
assumes that messages sent will always be delivered and that messages sent
between two nodes are delivered in the order in which they are sent. The paper
demonstrates the application of logical clock using a simple distributed
resource allocation algorithm. They extend this logical clock model to
physical clock where each node has a clock ticking at almost the same rate and
the clocks are kept in sync by forwarding them on receipt of messages from
other nodes.
The paper provides some very fundamental results on ordering events where no
global time is available. Deriving time based on events happening is pretty
interesting. However, the assumptions made by the system sound very
unrealistic to today's large scale distributed systems where messages sent
from one node to another can fail and no ordering guarantees are made since
different packets can take different routes. I think this is generally because
of two reasons.One, the huge distributed systems we see today typically are
failure aware and should address the fact that machines or communication can
fail. Two, the paper talk about systems where the message transmission delay
is significant compare to time between events. This is not very true for
systems like dynamo where the events correspond to say writes to the same key
or keys which require some causal ordering.
The resource allocation algorithm is not relevant for distributed systems that
have message failures since a processor Pi can never be sure if it has
received all requests with time stamp later than Tm from other process. Hence
this resource allocation allocation example strongly requires the system to
not have network partition which does not sound reasonable in todays systems.
Posted by: Sathya Gunasekar | February 16, 2012 12:28 AM
The Lamport paper is a very theoretical body of work that discusses ordering events in a distributed system. The technique was labeled the 'happens before' relationship. Messages where used to as a vehicle to synchronize processes. To be short it's basically proposing a method for building distributed state machines. It does seem a very early work and it would be interesting to read a state of the art paper to see where advances have lead. Could we read a paper on Paxos or maybe vector clocks? To paraphrase Lamport's own home page, I don't know if this paper is trivial or brilliant!
Flaws:
The resource sharing algorithm discussed in this paper is just maddening. As an illustrative example of the concepts in the paper it's acceptable, but it does not seem practical at all. In particular it does not scale to a large number of machines as to request the resource a single machine has to send messages to all other machines. Further adding a single client that is a large network distance away from the others the fact that it needs to respond to the other clients limits the effective acquire / release rate for the resource being shared.
On page 563 they state that a discrete clock can be thought of as a continuous one in which there is an error of up to 1/2 'tick' in reading it. Is that really true? Say for instance that real time is less than but infinitesimally close to t' and that t' = t + TICK_SIZE. At that point reading the clock will give you a time of t not t' ( an error of a whole tick, not 1/2 a tick ). I could agree with an 'expected' error of 1/2 tick.
It seems very unrealistic that they require that clocks never travel backwards. For instance what about daylight savings? Or leap seconds ( can be backwards ). Even using the logical clocks discussed in this paper, the counter can roll over from it's maximum value back to zero ( breaking the 'clock condition' specified on page 560 ).
The exact idea of the time of a message being sent and received is never covered. Technically if the time of transmission is put into a message and then it's put on the wire, transmission isn't complete until the last byte has been sent. So isn't there a message size and bandwidth dependency. As a side question could you measure the minimum bandwidth of a link by sending pairs of packets of varying sizes and then timing the delay between receipt of the two?
It seems like relaxing a requirement in the theorem that for any message u_m that u_m<= u would be a good idea. For instance, the round trip time from Madison to the South Pole is on the order of 750 ms. The round trip time from the authors laptop to another machine in the house is .7 ms. A distributed system that operates across wide geographical distances would have all clocks synchronized to some bound that depends on the area covered by the system! A practical extension might include altering PC1 such that instead of placing a bound on the clock rate of ALL clocks that reference clocks be allowed. On either end of a long link GPS clocks could be installed!
A time related point that is just begging for coverage in regards to a distributed system, is with increasing number of clocks you'd think that the error between any given clock and real physical time could be lowered. More statistics, better error estimates!
Another missing point from their time synchronization algorithm is adding or subtracting machines. Doesn't adding a machine to the middle of a graph of processes with a fast clock corrupt every other clock in the system? The correction algorithm favors the fast clock ( see IR2' part b, uses 'max' ). How does a network partition effect this algorithm?
Posted by: Matt Newcomb | February 16, 2012 12:15 AM
Seth Pollen, David Capel, Victor Bittorf, and Igor Canadi
This paper explores how to create an ordering of events in a distributed system. In such a system, each node has it’s own clock which is not synchronized to any other clock in the system. Even without a globally synchronized clocks, this paper presents a means to create a total ordering.
The key insight Lamport makes is the relationship between message passing and synchronization. The primary contribution is constructing a theoretic framework for understanding that communication is, in a sense, in and of itself a form synchronization. In this sense, two nodes that never communicate have no concept of synchronization with each other since synchronization cannot exist without communication.
From this foundation, he demonstrates how a system can be constructed to produce a total ordering in a distributed system. This result is important because it establishes a rigorous theoretic understanding of how synchronization functions in a distributed system. The algorithm this paper presents can be directly applied to basic synchronization problems.
This formalization of order is somewhat limited, though, since the ‘happens before’ does not allow you to make definite statements about the causality relationship between events. Vector clocks, a more robust clocking mechanism, is based on this seminal work of Lamport clocks. Vector clocks can be used to solve more difficult problems, such as handling race conditions.
While Lamport Clocks form and important theoretic background, the algorithm in it’s original form have severely limited practicality. Since the algorithm waits to hear back from all nodes, the algorithm is not very fault tolerant. There is also the issue of scaling this since all nodes need to know about all other nodes. This presents a challenge to implementing this approach in a real system. While this is not a specific flaw with the formalization of the clocks, it is still relevant to the practicality of this approach.
The observation that, if clock error is low enough and a sufficient number of messages are passed, then your system is equivalently to knowing the physical ordering of the events. This is a particularly interesting result, but it is somewhat nebulous to apply. From our discussion, it was unclear how this result could be applied to a real system to realize a benefit. Are there rules or design principles related to this bounded error? Is there a practical concern here or is it primarily for theoretic rigor?
Although we have indirectly discussed application of Lamport Clocks, it is somewhat of a difficult topic. This is because it seems that Lamport Clocks in their formalize form are not suited for direct application. As already mentioned, the specific algorithms serve well a proof-by-construction but do not directly apply to a real system where scalability and fault tolerance are important. Lamport Clocks do serve as a foundation to many real problems that come up in distributed systems, however.
Consider an example where there are two nodes which need to access a shared resource. If these two nodes need to cooperate for access to this resource (for example, one nodes tells the other to read some data it has updated), then the two nodes need a means to synchronization around this resource. While the verbatim algorithm for this paper may not suffice due to practical issues, but the concept of ordering is pivotal to developing a solution to this problem. In this sense, Lamport Clocks form a framework for understanding and developing practical solutions (as opposed to serving as a stock or cookie-cutter solution).
This makes Lamport Clocks important in application since it give a reasoned and proven starting point to approach an entire class of practical problems related to ordering of events and synchronization. Without a such theoretic grounding to work from, most attempts at synchronization would be justified by approaches such as, “we tried it on several examples and it seems to work pretty well.”
Finally, on a somewhat of a side note, the usefulness of the use of special relativity seemed somewhat dubious. It is unclear how necessary such a reference really is. We were unable to determine if there was an application for this. It would be helpful to have discussion in lecture as to the motivation this, and if this necessary and or relevant.
Posted by: Victor Bittorf | February 15, 2012 11:32 PM
This paper talks about how events occurring in distributed systems can be represented as a binary relation enforcing a partial ordering among the events and how a total ordering of events can be inferred from the partial orderings.
There are two main contributions of this paper:
- The mathematical treatment of events occurring in a distributed system - defining a binary "happened before" relation between them and showing that the relation enforces a partial ordering among events
- The proposal of the idea that a system of logical clocks represented by monotonic functions could be used to enforce an arbitrary total ordering among events
Zooming in on the contributions and other nice points in the paper:
- One nice point in the paper is the formalized notion of a distributed system as viewed in the paper - a system in which message transmission delay is not negligible when compared to time between events in a process. Though there are many definitions to a distributed system, I guess this one makes concrete sense in the context of the paper.
- The binary relation "happened before" being defined by messages and local events and not in terms of physical time is something unconventional and novel that this paper has presented.
- The biggest contribution is the idea that things as simple as monotonically increasing unsynchronized counters in different processes along with messages between processes can be used to construct a system of logical clocks that insure the correct ordering of events that are in the "happened before" relation.
- The next biggest contribution is the observation that the ordering produced by the logical clocks is a way of imposing a partially correct total ordering among all events in a distributed system.
- The distributed mutual exclusion algorithm is an apt application of the total ordering produced by the system of logical clocks.
- Although it seems rather obvious, the rigorous mathematical model to estimate a relationship between minimum transmission delay, clock skew and inter-clock drift to show that physical locks with suitable values for said parameters can be used as the logical counters, was quite interesting.
The paper had a few small flaws:
- Their initial assumption that the receiving of a message does not sending or receiving of any other message may be a bit unrealistic depending on the mechanisms used by the underlying network to send and receive messages (for example a group of messages may enter the software network stack as a chunk and my be assigned the same timestamp by the software).
- Although the total ordering algorithm is resilient to failures (failures are treated as long pauses between events), the algorithm for distributed mutual exclusion where a process waits for suitably timestamped messages from all other processes and each process knowing other processes in the system does not work out in the presence of failures and scaling.
- For the system running the mutual exclusion algorithm, the formalism of an FSM seems irrelevant primarily because a) In an FSM, events are external irrelevant to the state, but in this system, events seem to be internal to the system and b) FSM halts when a transition is not present in the transition table but this system seems to halt while waiting for messages (or waiting for an input symbol) c) The state of FSM does not seem to model the queues cleanly - I'd guess a pushdown automaton with a stack (push a symbol on request message, pop it on release) seems to be more relevant.
Though the paper presents some very innovative ideas at that time, backed up by concrete theory, in today's large scale systems I'm not sure:
a) How many systems require a total correct ordering of events to guarantee consistency or for other purposes (wouldn't a form of partial ordering suffice?)
b) Even if they did require a total correct ordering, how many large scale systems would use an idea which is not very well-defined in the presence of failures, nodes entering and leaving the system and humongous scale.
Posted by: Venkatesh Srinivasan | February 15, 2012 11:25 PM
This paper introduces partial ordering “happen before” relation between events in distributed system. It then extends the partial ordering into total ordering in two methods, logical clocks and physical clocks, and tries to solve the synchronization problems in distributed systems.
Because there is no unique global clock in distributed systems, we cannot easily know the time of other machines so that when we have two operations, we cannot decide which one happens first. Absolute time in distributed system is not so important. On the other hand, ordering of events decides the output of systems and therefore a method of deciding the ordering of events is of significance.
The partial ordering happen-before relation is the main contribution. It can accurately predict the relation between two events. If event a “happens before” event b, it means event a casually decides event b and no mater how many times the whole setting is reexecuted, event a will always happens before event b in physical time. If event a and event b are predicted to be concurrent under “happen before relation”, event a and event b have no casual relations, which means even though in a certain execution, event a may happen before event b in physical time, it may not in another execution. Even though “happen before” relation has such beautiful properties, it is not very practical because there will exist many concurrent events in a distributed systems and claiming that their ordering can be different in another execution seems to be useless. After all, we often need to solve conflicts based on the ordering of events. Therefore, based on “happen before” relation, logical clock is introduced and physical clock is analyzed.
The main flaw of the paper is lack of practice. The algorithm of distributing mutual exclusion resource is perfectly correct in theory, but it involves too much communication to be useful in practice. Logical clock can exert total ordering on events. But the rules used to break ties seem to be arbitrary, which may cause much wrong prediction of ordering. Solving conflicts in a wrong way doesn’t seem to better than not being able to solve conflicts.
“Happen before” relation is the foundation of predicting ordering of events not only in distributed system but also in multiprocessors systems or multithreaded programs. Vector clock is invented to accurately implement “happen before” relation and is widely used in all kinds of systems. It is even used in other area including race detection.
Posted by: Xiaozhu Meng | February 15, 2012 10:08 PM
Summary:
In distributes systems, it is important to know the ordering of events in processes that are separated and Lamport in this paper proposes a logical clock solution to obtain a partial ordering among the events which acts as the basis for the total ordering of events and synchronization of physical clocks.
Problem:
Distributed systems such as resource allocation require a precise ordering of requests/events sent. But since the systems are spatially separated and communicate only via messages it is not possible to know the absolute ordering of events unless there are exactly synchronized and precise physical clocks. Maintaining perfectly accurate clocks is again not possible leading us back to our original problem.
Contributions:
Lamport’s introduces the concept of “happens before -> ” relation by which (1) all the events that occur in the same process are ordered by “->” among themselves and (2) sending of a message “->” the receipt of the message. Based on this the paper introduces the concept of the logical clock wherein each process i has a clock Ci which is (1) incremented between any two successive events and (2) set to max (Ci, Tm+1), where Tm is the timestamp of the received message which satisfies the clock condition that if a->b then C . Lamport makes an important observation here that the converse is true only in a totally ordered system and introduces a really simple way of obtaining a total ordering by ordering the processes in some arbitrary total order and using that to order events that are concurrent. However, this might cause anomalous behavior as observed by an outside entity due to the arbitrary ordering of concurrent events. Lamport cleverly solves this problem by going back to the introduction of physical clocks and using the concept of estimated message delay to synchronize them within bounds.
A few notable points about the distributed algorithm for resource allocation based on total ordering of events are:
(1) The total ordering of events enables the algorithm to be completely distributed with no centralized point of control/failure. This is probably the most important of all the contributions of the paper.
(2) It is based on the assumption that messages between 2 processes are delivered in the same order they are sent. This is an important assumption as it ensures that all processes are aware of all the events that happened before its current request.
(3) Drawbacks include the large number of messages that need to be sent for a single resource allocation to occur (3(n-1) messages where n = # processes for each request generated). This is a serious limitation on the scalability.
(4) Another drawback is the intolerance to failures. As it is dependent on the acknowledgements and release messages the entire system would stall if a network failure or a system failure occurs.
In my opinion, the paper has provided a couple of significant contributions related to ordering of events in a distributed system. Though not presently used, logical clocks neatly solve the problem of ordering of events without the need for a physical clock. The idea of synchronizing physical clocks within bounds also not currently used was probably the basis for modern time clock synchronization protocols.
Posted by: Madhu Ramanathan | February 15, 2012 09:59 PM
The author analyzes the ordering of events in the context of happens before relation without using physical clocks. He discussed the possible problems with partial ordering and provides an algorithm to extend it to a consistent total ordering. The use of total ordering is motivated by a simple resource scheduling problem. Finally, he mentioned the relevant topics such as anomalous behaviors and physical clocks.
This is a theoretical paper and the author clears many practical concerns with assumptions, and concentrate on simple but challenging problems. Main problem is that different machines will have different clocks, therefore it is hard to order the sequence of events. We may have partial ordering with a simple clock condition but it is important to have total ordering to solve some problems (e.g., mutual exclusion).
The main contribution is that the author introduces an approach to obtain arbitrary total ordering using a clock which satisfies the clock condition. He motivates the use of total ordering on a simple resource scheduling problem (i.e., assign a single resource to process considering the order). Having a partial order isn't enough to solve this problem, but with the given total ordering approach, he showed how this problem can be solved. Although the algorithm that is given to solve the resource scheduling problem is to support the use of total ordering, I believe it is also an important contribution since it clearly introduces an algorithm to use total ordering for resource scheduling.
Although I believe it is a very important work, some parts are not clear to me. For example, I do not understand the reason why failures are only meaningful in the context of physical time. I believe even for the given simple mutual exclusion problem, failures can cause many problems and makes it hard to solve the problem. The author also made some assumptions to simplify the problem, which I believe are very questionable. The first assumption is that all the messages sent from Pi arrive in the same order to Pj. In a distributed system, it seems to be a very strong assumption. I believe since we do not know when two messages sent from different processes arrive to a same process, similarly we should not assume two messages sent from same process arrive in the correct order. Another assumption is that a process can send messages directly to every other process. This assumption is also questionable because (i) in a distributed system we will never be sure that a process knows every other process, (ii) we can never guarantee the messages arrive to the destination. It also causes a scalability issue which is very significant for a distributed system.
Although the paper does not touch some very important practical concerns, given that the citations and awards this paper received, I believe it is one of the most influential works in distributed systems in the context of clocks and synchronization. This theoretical approach leads to some other clock algorithms (e.g., vector clocks) which lead to the modern approaches. The ideas in this work could be useful for any modern system that requires ordering of events and that does not guarantee global timing.
Posted by: Halit Erdogan | February 15, 2012 09:05 PM
The author analyzes the ordering of events in the context of happens before relation without using physical clocks. He discussed the possible problems with partial ordering and provides an algorithm to extend it to a consistent total ordering. The use of total ordering is motivated by a simple resource scheduling problem. Finally, he mentioned the relevant topics such as anomalous behaviors and physical clocks.
This is a theoretical paper and the author clears many practical concerns with assumptions, and concentrate on simple but challenging problems. Main problem is that different machines will have different clocks, therefore it is hard to order the sequence of events. We may have partial ordering with a simple clock condition but it is important to have total ordering to solve some problems (e.g., mutual exclusion).
The main contribution is that the author introduces an approach to obtain arbitrary total ordering using a clock which satisfies the clock condition. He motivates the use of total ordering on a simple resource scheduling problem (i.e., assign a single resource to process considering the order). Having a partial order isn't enough to solve this problem, but with the given total ordering approach, he showed how this problem can be solved. Although the algorithm that is given to solve the resource scheduling problem is to support the use of total ordering, I believe it is also an important contribution since it clearly introduces an algorithm to use total ordering for resource scheduling.
Although I believe it is a very important work, some parts are not clear to me. For example, I do not understand the reason why failures are only meaningful in the context of physical time. I believe even for the given simple mutual exclusion problem, failures can cause many problems and makes it hard to solve the problem. The author also made some assumptions to simplify the problem, which I believe are very questionable. The first assumption is that all the messages sent from Pi arrive in the same order to Pj. In a distributed system, it seems to be a very strong assumption. I believe since we do not know when two messages sent from different processes arrive to a same process, similarly we should not assume two messages sent from same process arrive in the correct order. Another assumption is that a process can send messages directly to every other process. This assumption is also questionable because (i) in a distributed system we will never be sure that a process knows every other process, (ii) we can never guarantee the messages arrive to the destination. It also causes a scalability issue which is very significant for a distributed system.
Although the paper does not touch some very important practical concerns, given that the citations and awards this paper received, I believe it is one of the most influential works in distributed systems in the context of clocks and synchronization. This theoretical approach leads to some other clock algorithms (e.g., vector clocks) which lead to the modern approaches. The ideas in this work could be useful for any modern system that requires ordering of events and that does not guarantee global timing.
Posted by: Halit Erdogan | February 15, 2012 09:04 PM