« The Part-Time Parliament | Main | Paxos Made Live – An Engineering Perspective »

Paxos Made Simple

Leslie Lamport. Paxos Made Simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) 51-58.

Review due Tuesday 3/22.

Comments

Summary:
This paper gives a brief theoretical overview of Paxos, an algorithm for achieving consensus in a distributed system, allowing for the consistent operation of a deterministic state machine replicated among several servers.

Problem:
Ideally, when operating a client-server application that relies on modification of data, one would like to have the central server replicated, in order to increase availability. To ensure that this operates in a deterministic way, however, all replicas must choose the same ordering for commands, which requires an algorithm for achieving consensus.

Contributions:
This paper provides a condensed overview of the basis of the Paxos algorithm, and explains how it can be used to implement a replicated deterministic state machine. The basic Paxos algorithm works in two phases. First, a distinguished proposer issues a numbered proposal to a set of acceptors (all of which are nodes in the distributed system). If an acceptor has not seen a higher numbered proposal, it responds with the last value that it has seen, if any. In the second phase, the proposer then sends out an accept request with the value of the highest numbered proposal accepted thus far or any value it chooses if no proposal has been accepted; acceptors then accept this proposal if they have not already accepted a higher numbered one. Once a majority of acceptors have accepted a proposal, the value for that proposal is considered chosen. The algorithm assumes a fail stop model with messages that may be lost or delayed, but not corrupted.

The paper then shows how to apply this to a state machine replicated across servers by running the Paxos algorithm once for each command issued from a client. In order to maintain tractability, the paper provides a number of optimizations, including the ability to effectively issue an unlimited number of proposals at once. Overall, though no thorough proof is given, it seems like a reasonable system to employ when one wants to ensure strong consistency in a system.

Flaws:
Given that this is a theoretical paper, there is no real sense of how this would perform in a real-world environment; the paper does present a number of optimizations, but it’s not clear how well these work to improve potential performance. The paper relies on the impossibility of byzantine failures and message corruption, and mechanisms to ensure that this is effectively the case may add further overhead. Performance is also heavily dependent on rarity of failure, which can be hard to ensure in large scale cases; more frequent failures, or even failures at inconvenient times, may lead to noticeable irregularities in overall performance as the first phase needs to be rerun.

Relevance:
Overall, the ideas in this paper seem most practical for small scale distributed systems where consistency is a necessity. While this would clearly work poorly for something on the scale of, say, Dynamo, it would likely work well for a system of fewer than ten nodes. It could also be used to achieve consensus in a critical subsystem of a larger scale distributed system.

Summary
Lamport describes the Paxos algorithm, used to determine consensus in a distributed system.

Problem
In a distributed system, there must be some way for nodes to come to a consensus about a proposed value. However, in the presence of failures or message loss, both possible and common in a distributed system, it can be difficult to correctly choose only a single value. Having a single site choose the value is simple, but that site becomes a single point-of-failure and may also be a bottleneck.

Contribution
There are three classes of agents (nodes) in the Paxos algorithm: proposers, acceptors, and learners. The Paxos algorithm allows for multiple acceptor agents, removing the single point-of-failure and bottleneck issues described above.

To guarantee progress in the Paxos algorithm, a distinguished proposer is selected as the only one trying to initiate proposals. To issue a proposal, a proposal node chooses a new proposal number n, and sends a “prepare request” to a set of acceptors, requesting that each acceptor (1) not accept any future proposals with number less than n, and (2) the proposal with the highest number less than n that it has accepted, if one exists. After this information has been retrieved from a majority of the acceptors, the proposer issues an “accept request” with number n and value v, where v is the value of the highest-numbered proposal (from the prepare request responses). If no responses included a proposal, v may be any value selected by the proposer.

The learners must determine which proposal has been accepted by the majority of the acceptors. Using a set of “distinguished learners,” acceptors respond to learn requests with their acceptances, which are then forwarded on to all learners.

In the general algorithm, nodes can play all three roles. However, a single leader, who acts as distinguished proposer and distinguished learner, is chosen.

Paxos can be used to implement a state machine, where a separate instance of the Paxos algorithm is used to pick each of the state machine commands. Here, a single server is elected leader and determines the order in which commands are entered into the sequence.

Flaws
None noticed. Like many of the other theoretical papers, there are some complicated implementation-specific details that are lacking, but they are beyond the scope of this paper.

Relevance
As we have seen in a variety of practical systems papers, using state machines and ensuring that commands are processed in the same order at all nodes are common necessities. Paxos provides an algorithm to ensure consensus is reached by a set of nodes in a system. By avoiding a single acceptor node, Paxos allows these algorithms to continue without needed a centralized coordinator node.

Summary
The paper describes Paxos algorithm, for achieving consensus in a distributed system, and its state machine implementation. Paxos algorithm ensures that a single one among the proposed values is chosen across the system and thus servers in the system perform consistent operations.

Problem
Distributed systems use multiple servers to improve reliability, availability; or multiple actors may be inherent in a system. Ensuring consistent operation across multiple actors is usually achieved by synchronous operation. Paxos algorithm allows us to totally order operations across servers to achieve desired universal behavior of the servers.

Contribution
The majority rule based consensus algorithm guarantees that no conflicts occur in the system even in the presence of multiple independent proposers.
The multiple independent agents namely proposers, acceptors and learners can operate concurrently, and hence achieve much better performance than synchronous operation of the system. Once proposals have been chosen by acceptors, the learners can execute the commands pertaining to accepted proposals at their own pace.
The persistent state maintained by proposers, acceptors and learners allows correct behavior of the system in presence of arbitrary faults. Routing all proposals through distinguished leader prevents the system from getting stuck into a livelock.
Paper also describes a state machine implementation of Paxos algorithm known as Multi-Paxos. Multi-Paxos runs sequence of separate instances of Paxos algorithm from a distinguished leader. This enables us to get rid of Phase1 of Paxos algorithm for the lifetime of a distinguished leader and thus reduces network traffic significantly. Multi-Paxos also allows numerous (alpha) outstanding proposal requests, thus increasing performance/throughput of the algorithm implementation.

Faults
Paxos is not a fool-proof algorithm. Multiple leaders (proposers) can lead to livelock in the system and impede the progress of the system.
Distinguished leader may become a bottleneck in a large system where a single machine cannot satisfactorily issue proposals on behalf of all clients.
Paxos algorithm involves lot of network traffic per client request (messages sent from distinguished leader to acceptors and from acceptors to learners). Each request involves disk i/o operations at proposer, acceptors and learners. This overhead can be intolerable for systems which aim for higher throughput.

Applications
Paxos algorithm can be useful in systems where determining total order of operations is important. It can be used to consistently operate on multiple replicas in a distributed file system or databases. It can also be used to implement distributed lock protocol.

Summary
The paper gives a simple description of the Paxos algorithm for safely reaching a consensus in a distributed system.

Problem
An algorithm is desired which causes nodes in the system to agree on a value. The distributed system is unreliable in that messages can be lost or duplicated, and nodes may fail and restart. In order for the algorithm to be safe, a value can only be learned once it is chosen, and only one value can be chosen.

Contributions
Lamport presents the the Paxos algorithm for solving the consensus problem. He also gives justification for the algorithm. Finally he gives shows how the Paxos algorithm can be used to implement a distributed state machine that is consistent.

Flaws
I found it strange that electing a leader node seems like a very similar problem to forming a consensus, and electing a leader, the distinguished proposer, was needed to guarantee progress in the Paxos algorithm.

There are of course some inefficiencies in the algorithm. However, Lamport addresses some of them and I believe that other papers propose efficiency improvements.

Discussion
I think that in a lot of cases today, distributed systems are not very concerned with the last two consensus requirements: that only a single value is chosen and that a process never learns a value before it is chosen. Most of the time speed is more important. For instances where safety is important, such as financial transactions, it may be worthwhile to use the Paxos algorithm or something similar.

Summary:
The paper is about Paxos, a consensus algorithm for state machine replication for distributed system.

Description of Problem:
Nodes in a distributed system can fail. The network in unreliable where messages can get delayed, lost, duplicated, or corrupted. Many clients can send messages to different nodes of a distributed network. If the ordering of the messages is important, there need to be a form of consensus in choosing a value in order to replicate a consistent state machine across the nodes.

Summary of contributions:
The authors explain the Paxos algorithm and its implementation on a state machine. The Paxos algorithm consists of 3 types of agents: proposers, acceptors, and learners. There are multiple of these agent to provide fault tolerance. A leader is chosen from the proposers to propose values to the acceptors. The algorithm uses a majority to pick a value and consists of two phase: a prepare phase and an accept phase. Once a value has been accepted, it is learned by the learners.

Flaws:
The paper is a theoretical paper so I couldn't find much flaws in the algorithm. The algorithm uses multiple agents to suppose failures. There's a lot of messages being sent but there's no mention how how big the messages are or how many agents it is meant to support.

Application to real systems:
Paxos seems to be useful in choosing an ordering of choices that could be used in replicated state machine system. The idea seems pretty simple and it provides safety, progress, and some fault tolerance which might be ideal for some type of distributed systems.

Summary
This paper provides a simple explanation of the Paxos algorithm for implementing fault-tolerant distributed systems.

Problem
In a distributed system where multiple nodes can update data values, it is important that nodes come to a consensus on what value a given data item should have. However, in the presence of failures such as nodes crashing or messages being lost, this can be difficult to achieve.

Contributions
In the model the paper presents, a system consists of three types of nodes: proposers, acceptors, and learners. A single node can perform multiple roles, and most roles will be played by multiple nodes, in order to mitigate the damage caused by a single node failing (with the exception of proposer, which is sometimes performed by only one node – the distinguished proposer). The Paxos algorithm provides a protocol for how these nodes can interact in order to agree upon a value. The algorithm consists of rounds where proposer(s) propose new values to acceptors, who respond based on the values they have seen so far. Once a proposer's value has been accepted by a majority of the acceptors, the value is considered current, and the acceptors can then inform learners of this new value. To ensure progress and to avoid deadlock, the paper discusses the idea of a single proposer, the distinguished proposer; in this model, only the distinguished proposer can propose updates, so there are no degenerate cases where the algorithm repeats endlessly.
The paper also presents an implementation of this algorithm using state machines.

Flaws
As with other Lamport papers we've read, I felt that some aspects of this algorithm might be difficult or costly to implement in a real system. However, this paper is much more concerned with the theory than the application. I also felt that the paper glossed over the problem of having a distinguished proposer. It claims that progress can be guaranteed if we use a distinguished proposer approach, but the problem of coming to a consensus on who should be the distinguished proposer is a difficult problem in and of itself, similar to the original problem they are trying to solve.

Application
It seems that many systems today are willing to sacrifice a little strong consistency for better performance or availability. Given that Paxos spends so much effort achieving strong consistency, it might be less useful in these scenarios. However, the algorithm it describes does seem like a reasonable thing to implement in a system which is willing to sacrifice a little performance or availability in order to provide guarantees about consistency.

Summary:

This paper is a simplistic explanation of the Paxos algorithm which is a consensus algorithm for a distributed system. The algorithm aims at having a value stored across a set of nodes based on a majority consensus between the nodes.

Problem:

The paper aims at solving the problem of choosing a single value from a bunch of proposed values in a distributed system. Its aim is to ensure that all known nodes eventually get to know about the accepted value and therefore is flexible enough to incorporate eventual consistency.

Contributions:

One of the important contributions of this paper is a two phase algorithm where in the first phase a proposer sends a prepare message to a bunch of acceptors. An acceptor responds to the proposal if the proposal number is greater than any of the proposals it has accepted so far. In the second phase the proposer sends an accept message if had received a majority consensus from the acceptor in the first phase. This two phase approach is analogous to any of the quorum approaches we have read so far and is an important design decision.

Another important contribution of this approach is the use of a distinguished proposer, this helps in having progress in the system. The idea of electing a proposer is also important so as make sure a proposer is always chosen and so as to get a new proposer incase the current one fails.

Similarly the implementation of a set of distinguished learners is an important optimization that can help in propagating a bunch of updates to a larger set of learners. This provides for greater reliability over having a single distinguished learner.

Applications:

1. The idea proposed in this paper could be used in the implementation of a distributed commit protocol in distributed databases.

2. It could also be used in any key value store to maintain consensus among nodes

Flaws:

1. The assumption that messages cannot be corrupted is a bit too strong

Summary:

This paper is a simplistic explanation of the Paxos algorithm which is a consensus algorithm for a distributed system. The algorithm aims at having a value stored across a set of nodes based on a majority consensus between the nodes.

Problem:

The paper aims at solving the problem of choosing a single value from a bunch of proposed values in a distributed system. Its aim is to ensure that all known nodes eventually get to know about the accepted value and therefore is flexible enough to incorporate eventual consistency.

Contributions:

One of the important contributions of this paper is a two phase algorithm where in the first phase a proposer sends a prepare message to a bunch of acceptors. An acceptor responds to the proposal if the proposal number is greater than any of the proposals it has accepted so far. In the second phase the proposer sends an accept message if had received a majority consensus from the acceptor in the first phase. This two phase approach is analogous to any of the quorum approaches we have read so far and is an important design decision.

Another important contribution of this approach is the use of a distinguished proposer, this helps in having progress in the system. The idea of electing a proposer is also important so as make sure a proposer is always chosen and so as to get a new proposer incase the current one fails.

Similarly the implementation of a set of distinguished learners is an important optimization that can help in propagating a bunch of updates to a larger set of learners. This provides for greater reliability over having a single distinguished learner.

Applications:

1. The idea proposed in this paper could be used in the implementation of a distributed commit protocol in distributed databases.

2. It could also be used in any key value store to maintain consensus among nodes

Flaws:

1. The assumption that messages cannot be corrupted is a bit too strong

Summary:

This paper is a simplistic explanation of the Paxos algorithm which is a consensus algorithm for a distributed system. The algorithm aims at having a value stored across a set of nodes based on a majority consensus between the nodes.

Problem:

The paper aims at solving the problem of choosing a single value from a bunch of proposed values in a distributed system. Its aim is to ensure that all known nodes eventually get to know about the accepted value and therefore is flexible enough to incorporate eventual consistency.

Contributions:

One of the important contributions of this paper is a two phase algorithm where in the first phase a proposer sends a prepare message to a bunch of acceptors. An acceptor responds to the proposal if the proposal number is greater than any of the proposals it has accepted so far. In the second phase the proposer sends an accept message if had received a majority consensus from the acceptor in the first phase. This two phase approach is analogous to any of the quorum approaches we have read so far and is an important design decision.

Another important contribution of this approach is the use of a distinguished proposer, this helps in having progress in the system. The idea of electing a proposer is also important so as make sure a proposer is always chosen and so as to get a new proposer incase the current one fails.

Similarly the implementation of a set of distinguished learners is an important optimization that can help in propagating a bunch of updates to a larger set of learners. This provides for greater reliability over having a single distinguished learner.

Applications:

1. The idea proposed in this paper could be used in the implementation of a distributed commit protocol in distributed databases.

2. It could also be used in any key value store to maintain consensus among nodes

Flaws:

1. The assumption that messages cannot be corrupted is a bit too strong

Summary
The Paxos algorithm is re-described by Lamport in simpler terms than the original, using less references to Greek history, less mathematics, and less alliteration. The consensus algorithm is shown to be useful for creating a replicated state machine.

Problem
For many distributed applications, agents in the system need to agree on some facts in order to be useful or consistent. This is non-trivial considering that nodes and the network may be faulty. Additionally, other agents, who may not be participating in the decision, may need to be informed of what is agreed upon. This paper assumes a non-byzantine model, where the network can have temporary errors, and nodes may fail stop and restart.

Contributions
What makes the Paxos algorithm for consensus an important innovation, is that it's expression is rather simple, but the consistency guarantees are very strong.

The essential algorithm involves a proposer, a set of acceptors, and a set of learners. In the first phase, a proposer issues a prepare request to the acceptors with sequence number n. The acceptors will either reply with their current value and either a promise not to accept any new prepare requests with sequence number lower than n, or a NACK saying that the prepare is ignored. In phase 2, a proposer may now propose a value if a majority of acceptors have agreed to be prepared. This value has to either be the latest of all the responses for that proposal, or a new value if no responses indicate that a value has been set. If the proposer attains enough accept messages, then the proposal is accepted, and this information is forwarded to the learners.

Also, Lamport shows how the algorithm is naturally suited for implementing a replicated state machine, by have a distinguished node issue instances of Paxos based upon requests from others.

Limitations
It seems like a nefarious proposer could prevent progress by proposing things with large values of n. Perhaps this complaint isn’t relevant since we are using a non-byzantine model.

Applicability
This algorithm is definitely applicable as evidenced by its use in Google’s Chubby distributed lock service. It lends itself naturally to the implementation of a replicated log. It will be useful when strong consistency is important, and when the issue of security is handled through other means.

Summary:
This paper describes Paxos, a distributed and decentralized consensus algorithm. The algorithm uses a Proposer-Acceptor-Learner setup, with a 3 phase operation.

Problem:
We have a distributed system with multiple nodes. How do we make all nodes agree on the same things? This is important because if we can accomplish such a primitive, it will come of use in many places - sequencing of commands, database updates etc. The most obvious way is to have a centralized server, which sees all operations, but this setup is not optimal and is faced with the risk of this central server failing. So, what we really need is a decentalized consensus algorithm.

Contributions of the paper:
Primarily the algorithm itself, its proof and an example of a state machine implementation. The new way of looking at the problem will be split the set of nodes into three sets - Proposers, Acceptors and Learners. Proposers first let the acceptors know they would wish to propose, and hear back from Acceptors whether they can go ahead or not. The acceptors let the proposers know whether they can or not, letting the proposes know what is the latest accepted value, in the latter case. If the acceptor accedes, the proposer sends the proposed value. The acceptors at this point decide whether to finally accept the value or not. If the value is accepted, the accepted value is propogated to othe r learner nodes. If not, we repeat the same process again after a time out. Lamport also proposes a few optimizations to the algorithm.

The leader based operation of this algorithm is an interseting aspect.

As with his other papers, Lamport builds up the describing the problem itself, elaborating it step by step, making the algorithm and the proof easy to understand.

Limitations:
The algorithm requires a 'distinguished' proposer, and requries re-election of new proposer upon failure of a current one. Also, how 'decentralized' the algorithm really is is questionable because of the leader based setup.

The paper does not cover Byzantine failures, which could affect the system seriously. The algorithm gives more control to proposers - and two malfunctioning proposers can disrupt any progress in the system by choosing successive proposal numbers.

Applications:
Thinking about it, the algorithm seems to have relevance in any decentralized distributed system - many applications come to my mind like distributed locking, resolution of conflicts between nodes, leader selection etc. But, the algorithm presented as such, hinders its deployment in 'open' systems where the risk of an attack could be more.

Post a comment