« The Byzantine Generals Problem | Main | Paxos Made Simple »

The Part-Time Parliament

The Part-Time Parliament. Leslie Lamport; ACM Transactions on Computer Systems, Vol. 16, No. 2, May 1998

Review for this or other Paxos paper due Tuesday, 2/28.

Comments

'The Part Time Parliament'

Leslie Lamport introduces the paxos algorithm in 'The Part Time Parliament'. It trades off some fault tolerance for greater performance. If the beginBallot and Success message for two neighboring decrees is combined paxos can function in 2N ( N – number of priests ) messages. Compare this to the byzantine fault tolerance algorithms that can send up to (n-1)(n-2)...(n-m-1) messages.

Faults:

Only deals with benign failures. With the amount of money generated by google every hour in services a single act of industrial espionage could cost millions. If one service is down, it's going to drive users to another service. It seems that any service should be resistant to nefarious intentions and that paxos cannot provide this by itself. It does have 'self stabilization', but that seems to come from revoting on decree within some period. Wouldn't there then be a tradeoff to total number of possible decrees and the time required for 'self stabilization'.

Further the paxos parliament did not admit the possiblity that say a legislator had gone senile and made mistakes commonly. The paper mentions that a majority in the parliament was calculated based on the total bodyweight of the senators in session. What happens if there is a four hundred pound Strom Thrumond? The majority set could be additionally weighted by the trustworthiness of the senator.

The idea of specialists in a particular part of law seems like it could be implemented with a seperate paxos instance ( say a subcommittee ). If a decree cannot effect more than one area of the law you're good. In order to pass a decree that effects more than one area of law you'd have to be able to join two subcommittees and then split them again. Isn't this equivalent to network partitioning? Only having one president would seem like a bottleneck in the system as it could not handle more data than a single host could handle.

The paxos paper did not propose a good algorithm for recovering from a presidential failure. A simple timeout was required. What happens if a legislator sees the president walk out of the chambers? Can someone force an election without waiting for a timeout? Further no algorithm was proposed to select a president that did not have to update his/her ledger extensively.

If a senator is not in session it is unclear how he/she brings their ledger up-to-date. Do they have to content themselves with self stabilization? Ie.. Waiting through a whole round of voting before they can know that they have all the latest information?

Given all of these issues, they must be minor. If they where not I would not expect to see paxos in use in the real world ( for example the chubby lock service ).

Victor Bittorf, David Capel, Igor Canadi, and Seth Pollen

This paper sketches, by dint of an elaborate analogy, an algorithm for reaching consistent, global decisions in a distributed system where any node may fail at any time and any message sent between nodes may be lost or indefinitely delayed. Solving this problem allows the implementation of an arbitrary distributed state machine, which is useful for solving a wide variety of problems in distributed systems. The problem of the possibility of failure at any node or communication channel is a real problem faced by system designers today.

The paper provides an algorithm called the “Synod protocol” which solves the problem of distributed agreement for the case of picking a single value (called a decree in the analogy) from a set of candidate values. The paper sketches three versions of this protocol (called preliminary, basic, and complete). The preliminary protocol is the most general; it ensures only consistency (not progress) and requires each node to store a large amount of state. The basic protocol imposes additional ordering on the algorithm, sacrificing generality in exchange for reduced space requirements. The complete protocol introduces knowledge of timing in the system as well as the selection of a president node to force activity to take place; this ensures that the system will make progress, as long as a majority of the nodes remain on-line for a long enough period.

In a similar technique to the Byzantine generals solution, Lamport uses multiple parallel instances of the Synod protocol to solve the more general problem, where a series of decrees (rather than a single decree) must be decided upon. The first two steps of each instance of the Synod protocol can be combined once a president has been selected. Additionally, a null decree (related to the national olive day in the analogy) is sometimes necessary to ensure that the numbering of decrees is monotonically increasing.

We appreciate Lamport’s presentation of several techniques for consistently reading the record of decrees that have been passed. The problem is that the ledgers of different nodes may be more or less outdated, thus presenting an inconsistent picture of the global system state to any client who queries nodes in the system. One solution is for a client who wishes to read the global state to pass a decree to that effect. The client then waits to see the decree appear in the ledger, and it can then consistently examine the preceding decrees in the ledger of any node. A second and more efficient solution is for clients to employ Lamport clocks to track their causal interactions, allowing each client to know how up-to-date its reads from the system must be. A third solution is to partition the set of decrees among all the nodes so that each class of decree is served by a single node. Any query regarding a decree of that class must be directed to that node, thus ensuring a consistent view of that class of decrees.

It is unclear from the paper how decrees, once passed, are propagated to the ledgers of all legislators. It is possible for the president to receive enough votes for a decree and to write it down in his ledger, only to have all of the Success messages that he subsequently sends be lost. This would leave him as the only legislator with the decree written in his ledger. While the protocol may support consistency even in this case (and without any kind of propagation scheme), it would seem practical to provide a facility for this decree to eventually be copied into the ledgers of the other nodes.

We are also curious as to what kind of algorithm should be used to select the quorum for each ballot proposed. We know that any quorum must consist of a majority (however that is defined) but should be no larger than necessary (since it is harder to get more nodes to respond with affirmative votes). Beyond that, how are quora chosen? Perhaps each node keeps track of the other nodes it believes to be alive, and attempts to form quora entirely out of nodes believed to be alive.

We also do not understand the use of the magic number 3 in section 3.3.6 on page 157. Lamport requires that a newly appointed legislator not be included in the parliament until two additional decrees have passed since his appointment. He then says that every decree to appoint a new legislator should be immediately followed by two olive-day decrees. Why is this necessary? Why cannot changes to the makeup of parliament take effect immediately?

The major flaw we found with the paper is that the alleged history of the island of Paxos is totally false. In the very first sentence, Lamport contends that the island of Paxos lies in the Aegean Sea, but Wikipedia states that the island is situated on the other side of Greece, in the Ionian Sea. We doubt whether anyone who could make such an elementary geographical mistake can have anything meaningful to contribute to the field of computer science.

Post a comment