« Computer security in the Real world | Main | TreadMarks: Shared Memory Computing on Networks of Workstations »

Chord: A scalable peer-to-peer lookup service for internet applications.

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM '01: Proceedings of the 2001 on Applications, technologies, architectures, and protocols for computer communications, 2001. ACM.

Review due Thursday, 3/15.

Comments

This paper describes Chord, a P2P lookup system. The goal of Chord is to provide a way to look up the location of nodes based on keys (identifiers). This is meant to be used to locate the nodes on which data items are stored. It aims to do this without using any centralization and with good load balancing, scalability, and availability.

Chord works by using consistent hashing to map keys onto nodes. However, the innovative part of Chord is that each node need only keep a small part of the routing information for the system in order to perform lookups. Each node n keeps a "finger table", each entry giving the successor of n + 2^(i + 1), where i is the index of that entry. Nodes perform lookups on a key by contacting the largest node in the finger table whose identifier is smaller than that key. This is done recursively on the contacted node's finger table. Because of the way the finger table is constructed, it can be shown that a key can be located with O(log n) lookups, since the distance to the target key is at least halved each step.

One flaw in this paper is that security in the system seems to be largely an afterthought. Although the authors have a few paragraphs about malicious nodes in section 7, they don't adequately explain how the various types of attacks they envision could be prevented. It seems to me, especially with the recursive model of lookups, that malicious nodes could easily prevent certain lookups from succeeding, simply by returning key not found errors. If an attacker had control of a botnet, it would be really easy to bring the system down by slowly inserting malicious nodes that had Byzantine behaviors.

Another flaw with the paper is that the authors do not adequately explain how their system can survive partitions. It is clear that, if a node is unreachable, at some point the other nodes must assume that it has left the network. Suppose the network were partitioned into two equal sized partitions. Eventually, due to the stabilization algorithm, this will lead to two separate Chord systems. The authors do not really present a way for the partitions to be reunited after the network is healed without user intervention. In the main body of the paper, they allude to intervention from some human administrator, but that would mean the system relies on a central authority. In the future work section, they allude to some partition detection mechanisms, but it seems to me impossible in principle to distinguish between a partition forming and a large number of nodes going down.

In this paper, the author proposed the Chord protocol, a peer to peer look up service. The loop up service is one of the most important operations in p2p system. Chord protocol is used for this purpose. It maps the key to a server given the key within guaranteed time even in presence of node join and failures. Its advantage is that it doesn’t have a view of all other noes, so it’s easier to scale. All the look up can be resolved via log(N) messages where N is the number of nodes.

Chord protocol uses consistent hashing to find a node which provides a natural way of load balance. In the consistent hashing, the nodes and the keys are hashed to a circle with an m bit identifier. The difference compared with the consistent hashing in Dynamo is that the successor of a node will take the data of the node instead of many other nodes if the node fails. I think it may because the experiment of Chord didn’t use much data.

To make the look up resolved within log(N) hop, Chord maintains addition routing information for each node. It’s called finger table, which consists of at most m entries. Each entry of node n contains the identity of the first node that succeeds n by at least 2^(i-1), and the interval. This information doesn’t affect the correctness, but it can accelerate the look up process.

All nodes in the system are equivalent, so there is no single failure point, and the join of new nodes and failure of old nodes will not affect the availability of the system. When a new node joins, it learns its fingers and predecessors by asking other nodes. We also need to update the fingers of existing nodes. The overall complexity will be O(log^2(N)). When concurrent joins occur, it would be difficult to maintain the fingers of a large network. So the Chord first uses a stabilization scheme to keep the nodes’ successor pointers up to date. And then verify and correct table entries. It may be slow to do the verification, but it makes the following loo up much faster. This separation of correctness and performance makes the look up bound O(log(N)) not satisfied when loo up before fingers are updated, or even worse, the look up may fail if some affected region has incorrect successor pointers.

The author tested the Chord protocol by using simulation and prototype experiments. In the simulation, the protocol shows good load balance and path length. As for failures, the look up failure increases linearly as the failure of the nodes. The failure of look ups are almost all caused by the lost of the data in the failed nodes. So the system itself doesn’t introduce extra failures in presence of node failure. The author also tested the latency of look ups in a experiment Chord system deployed on the Internet. The result shows the latency grows slowly with the increase of the number of nodes which is consistent with the simulation result.

This paper describes a system for solving the specific case of the general distributed key-value storage problem, where the values being stored are the addresses of nodes in the system. The Chord system provides a robust, decentralized, peer-to-peer consistent hashing system that allows a set of data to be sharded across multiple nodes while still being quickly accessible to any node. Adding a local key-value store to each node would then be sufficient to implement a (non-replicated) distrubted key-value store. Chord responds gracefully to single-node failures and joins, but it may have trouble handling complex failures such as network partitions or malevolent nodes.
The problem being solved is the consistent hashing problem. Unlike other consistent hashing systems we have seen, however, this system seeks to solve the problem without any centralized control while still providing strong guarantees that the correct node is chosen every time (in the absence of complex failures). This differs from the previous application of consistent hashing that we have seen (web caches), where picking the wrong cache has performance penalties but does not jeopardize correctness. Furthermore, the Chord authors prove and demonstrate logarithmic upper bounds on both the lookup latency (logarithmic in the number of servers) and the space required on each server (logarithmic in the size of the key space), which means that this system can be easily scaled to very large clusters.
The paper notes that a trivial solution to the consistent hashing problem is to perform a linear search of the set of all known nodes each time a lookup is performed. This requires only that all nodes be maintained in some kind of linked list. But to achieve logarithmic time bounds, a more complex index must be maintained at each node. Therefore, each node keeps several “fingers” pointing at other nodes spaced around the consistent hashing ring. These permit bigger
jumps when traversing the ring in search of a particular node, with the effect that each jump should halve the remaining space to be searched. These fingers are more difficult to keep up- to-date than simple successor/predecessor pointers, but they can be updated lazily without compromising correctness.
The paper notes a few of its flaws near the end. One is that the system, as presented, cannot heal itself once its nodes are partitioned into disjoint rings (where no nodes in one ring have fingers to any nodes in the other ring, and vice versa). A handful of epidemic protocols are proposed for detecting and resolving such a condition. We believe that these protocols could probably handle most partition situations.
The bigger flaw noted is that Chord has no tolerance for malicious (Byzantine) nodes. This is especially troubling because Chord markets itself as a peer-to-peer system, and peer-to-peer systems are often open to the public and thus vulnerable to attack from malicious nodes. A few solutions are hypothesized for detecting Byzantine failures, but they are vague and do not provide strong protections.
The authors propose that a possible use for their system would be as a block store underlying a distributed file system. We question whether any distributed file system would be better
off using Chord than a more traditional centralized sharding approach. Chord provides nice decentralization and with it, perhaps, some load balancing, but it comes at a cost of logarithmic time for lookups. In general, it may be that Chord’s decentralization is only useful in situations that exhibit decentralized administration, in which case its inability to cope with malevolent nodes disqualifies it.
The evaluation of the system also lacks completeness. The simulation and real-world measurements both focus on lookup latency, but there is no attention given to system throughput. Perhaps this metric is not as theoretically interesting, but it seems that Chord will develop throughput bottlenecks under a real workload, where some keys receive far more traffic than others. They do, however, provide a hook for applications to set up replication in Chord (using a model similar to Dynamo’s), and this could be used to address the problem.

This paper describes Chord, a P2P lookup system. The goal of Chord is to provide a way to look up the location of nodes based on keys (identifiers). This is meant to be used to locate the nodes on which data items are stored. It aims to do this without using any centralization and with good load balancing, scalability, and availability.

Chord works by using consistent hashing to map keys onto nodes. However, the innovative part of Chord is that each node need only keep a small part of the routing information for the system in order to perform lookups. Each node n keeps a "finger table", each entry giving the successor of n + 2^(i + 1), where i is the index of that entry. Nodes perform lookups on a key by contacting the largest node in the finger table whose identifier is smaller than that key. This is done recursively on the contacted node's finger table. Because of the way the finger table is constructed, it can be shown that a key can be located with O(log n) lookups, since the distance to the target key is at least halved each step.

One flaw in this paper is that security in the system seems to be largely an afterthought. Although the authors have a few paragraphs about malicious nodes in section 7, they don't adequately explain how the various types of attacks they envision could be prevented. It seems to me, especially with the recursive model of lookups, that malicious nodes could easily prevent certain lookups from succeeding, simply by returning key not found errors. If an attacker had control of a botnet, it would be really easy to bring the system down by slowly inserting malicious nodes that had Byzantine behaviors.

Another flaw with the paper is that the authors do not adequately explain how their system can survive partitions. It is clear that, if a node is unreachable, at some point the other nodes must assume that it has left the network. Suppose the network were partitioned into two equal sized partitions. Eventually, due to the stabilization algorithm, this will lead to two separate Chord systems. The authors do not really present a way for the partitions to be reunited after the network is healed without user intervention. In the main body of the paper, they allude to intervention from some human administrator, but that would mean the system relies on a central authority. In the future work section, they allude to some partition detection mechanisms, but it seems to me impossible in principle to distinguish between a partition forming and a large number of nodes going down.

This paper introduces chord, a decentralized lookup service or a distributed hash table. Chord is based on consistent hashing, improves performance of consistent hashing by using finger table and gives a upper bound O(logn) for messages needed to update routing table in ideal case.

Scalable lookup service is difficult to implement. There are a lot of factors that need consideration, including load balancing and fault tolerance. For load balancing, consistent hashing is a good choice. But scalable and robust implementation is needed. For fault tolerance, we must be able to handle node join and exit that can be even concurrent.

I think there are two main contributions. One is introducing finger table. Finger table maintains successors that are at least 2^i away. According to the algorithm, if a node asks its ith successor where the key should go, the ith successor should only need to contact one of its (i+1)th,(i+2)th...mth successor. Otherwise, the original node will directly ask its (i-1)th successor, which violates the algorithm that nodes contact successor from mth to 1st. So for every i, there will be at most one message. Therefore O(logn) is the upper bound. The other one is the observation that each node’s successor must be correctly maintained. This comes from a property of the algorithm that we only use direct successor to determine whether we should put the key to the successor. For other entries in finger table, we only use them to approach the destination of the key and never cross it. Therefore, keeping direct successor correct can guarantee finding the destination.

One flaw of the paper is that it tries to give an upper bound so that users can know the limit of cost, while the given upper bound is only “high probability”. Therefore, there is still actually no upper bound on how much would cost in an extreme case like lots of nodes join at the same time and lots of nodes fail at the same time. Therefore, for users who must have guaranteed cost upper abound, this paper doesn’t give sufficient result, even though Chord should work very well in most cases.

In all, this paper gives a very successful method to implement a distributed hash table. It makes consistent hashing more practical to implement based on the finger table idea. It also inspires several open source projects. The implementation process itself is interesting and can be very beneficial for learning the ideas.


This paper describes Chord, a peer-to-peer (completely non-centralized)
lookup
service developed on top of the idea of consistent hashing. The niche the
designers were searching for was a distributed lookup service that can
handle
multiple failures, has no hierarchy or location-dependent naming, no central
master of any kind, provably correct lookups, and of course, scalability.

I would have to say that the largest contributions of this paper are the
proofs
of correctness and performance. The strongest notion they used for
proof was
"with high probability under standard hardness assumptions", which I
thought was
interesting, but also good enough in most cases. These types of
guarantees made
me want to see a system using Chord over a long period of time and not just
simulation results. A feature I found particularly nice was the O(log N)
required hops for a lookup, and the ability to scale this performance down
gracefully when data gets out of date. Also, looking at the related work
section, it seemed this system offered the most flexibility while providing
decently strong guarantees. It seemed Chord would work will in a very large
setting, as the simulations suggest.

A small flaw is that while the paper touted Chord's flexibility and
simplicity,
I feel that large amounts of flexibility could be a curse to a project,
since
application developers may not know how to use such a system. Indeed, a
key-value store is the hallmark project for Chord (it seems), but the
project
would have to be adapted to a certain service level agreement. Chord
seems more
like a good academic system that proves that correctness and performance can
mostly be preserved while achieving other good things like load balance,
decentralization, and availability. However, despite Chord's touted
scaling,
once a system gets large enough, more stringent service levels need to be
upheld, and I am not convinced Chord is up for the task of modification
without
losing its provable qualities. Another thing I noticed was that for
most of the
paper, the authors very deliberately tried to keep the routing information
small, but found that to achieve more reasonable load balancing, more
routing
information would need to be kept, and the decision to not use virtual nodes
reversed. These were key design choices made for the purpose of
scalability but
have undesirable side effects.

Almost any distributed system paper we read is a big list of tradeoffs
tailored
to a certain use case. This of course has value in current system design in
many forms, such as seeing good and bad qualities you did not think of,
learning
other peoples' lessons, tailoring your use cases, and discovering possible
limits to service guarantees. I will be honest in my opinion that this
paper
blends in with many of the other papers we read and has me viewing this
field as
a big, pessimistic, organization problem that has people thinking of
pathological failure cases that may happen less or more often than one
thinks.
Frankly, these papers kind of stress me out and make me lose hope in a
computer
system ever working the way I actually want it to. All told though,
distributed
systems are necessary for the types of tasks they are used for, and each
design
requires much thought, which is the take home point I seem to put in this
paragraph each week.

1. summary of the paper
The paper presents Chord, a scalable peer-to-peer lookup protocol addressing the problem how to determine the location of the node that stores a particular data item in an efficient and decentralized manner. Chord provides powerful operation mapping keys (associated with data items) onto nodes efficiently. Chord can work correctly and achieve provable performance as nodes join and leave concurrently, and its correctness remains with gracefully degraded performance even in face of partial correct information of nodes.
2. A description of the problem they were trying to solve (and how the problem solved)
The main problem this paper trying to solve is how to locate the node storing certain data items for peer-to-peer applications. This problem can be generalized as how to map keys onto values. For Chord, keys are associated with data items and values refer to nodes (IP address). The Challenges of the problem include how to ensure the load balance, scalability, high efficiency as well as good support for concurrent arrival and departure of nodes.
3. A summary of the contributions of the paper
Finger Table for each Chord node
Unlike preview work on consistent hashing based on impractical assumption that nodes were aware of most other nodes in the system, in an N-node network in the steady state, each Chord node only needs to maintain routing information for only about O(logN) other nodes, and resolving all lookups needs O(logN) messages to other nodes. Chord considers the routing (lookup resolving) issues from two aspects, correctness and efficiency. The former can be guaranteed as long as each node has correct information of its successor node on the hashing circle. The latter is achieved by maintaining a finger table with at most O(logN) entries for each node. Each node’s the finger table stores information about a small number of other nodes, and the node know more about nodes closely than about nodes farther away. In this way, a node may ask another node who knows more about the identifier of the lookup key.
Efficient adaption of concurrent nodes arrival and departure
The paper first introduce the “a single node join” case, where three steps need to be done for a join, first initializing fingers and predecessor, second updating fingers of existing nodes, and third transferring keys. Then concurrent operations are considered from two aspects, correctness and performance. A “stabilization” protocol is used to keep nodes’ successor pointers correct so that to guarantee correctness of lookups. In order to achieve efficiency, those successor pointers are used to verify and correct finger table entries.
4. The one or two largest flaws in the paper.
First, in base chord protocol part, the paper cites two properties of consistent hashing on max number of keys mapped to each node and max number of key needed to transfer in face of node arrival and departure. As mentioned in the paper, these two properties hold with high probability under certain conditions (such as using “k-universal hash functions” to provide certain guarantees in the case of nonrandom keys). However, these certain conditions may not be met for the design in this paper, thus theoretically, consistent hashing in Chord may not be good in some cases. Second, the use of finger table allow good support of nodes arrival and departure but lead to O(logN) messages per lookup which may still cause significant latency for certain applications.
5. A discussion of how the ideas in the paper are applicable to real systems.
This problem for this paper can be generalized as how to map keys onto values. The ideas in Chord can be applied to many other applications by giving different meanings for the keys and values. For example, in distributed index application mentioned in the paper, a key could be derived from the desired keywords and values could be lists of machines offering documents with those keywords.

Chord is a distributed key-value store system. It solves two problems, how to locate key/data items to several nodes and how to route among the nodes. Compared with other current solution for distributed storage, Chord supports large numbers of nodes and has smaller routing tables.

In distributed systems such as p2p applications, there are several nodes for data storage. How to distributed data to several nodes, how to locate data and how to route among them are problems that need to be sovled. In current solutions such as consistent hashing, each nodes need to know all other nodes' existence, which makes it not suitable for large scale system. Chord use smaller routing tables, so it support large systems; the cost of Chord is that it needs more steps to forward messages.

Chord is composed by two parts. One part is data location. All the nodes and keys are hashed to an m-bit ID(unique by SHA-1 hash). All node IDs are ordered, and then each ID (including node ID and key ID) will have a successor node ID which is the node ID that is exactly larger than this node/key ID. A key/data item will be stored in it successor node. This kind of data location has several benefits. With SHA-1 hash, storage load will be balanced among nodes. And it is scalable by changing m.

Another part of Chord is routing between nodes. When a key/data with ID x is stored or required, the request will go to any node y. By consistent hash(SHA-1), x's successor is z. Then y must know how to forward the request to z. In each node, it has a routing table which stored this node's 2^i-th successor(i>0) which is log(n) in total. Then by the algorithm, y will forward x to z.

Chord also has other mechenisms, such as node joint, leaving and replications. In node joint, new node first find it successor, then change related nodes' routing table, and finally copy data. A node's k successors will store it data replication to provide stabiliztion, node leaving and failure recovery.

These two parts of Chord are its contributions. In other distributed storage systems, they usually use consistent hash to hash data to one of the node; but in Chord, all IDs are ordered. This ordering will benefit the system in case of node joint and delete. In this case, it is very clear how key/data in a certain ID range will be moved. In routing parts, this routing mechenism reduce the routing table size, which makes it suitable for very large systems.

Comments: (1) In fact, the data location is independent from the node routing. The networks system can hide the routing from the storage. In the example of y wants to forward x to z, this can be done just by call the socket API, and nodes do not need to store routing system. While in networks, we can deploy different routing mechenism ( pass data from y to z ), such as OSPF or Chord.
(2) The routing protocol has trade-off between routing table size and route length. For n nodes, the routing table is log(n), but to pass a message, it needs to get through several logic successors in Chord, which increase the route length.

The Chord system presented in this paper provides a single, simple service:
given a key, it determines the node responsible for that key. It addresses
this problem in a robust and scalable way; the paper shows this both
theoretically and empirically.

The core of Chord combines a consistent-hashing system with a small amount of
routing information at each node to achieve lookups with logarithmic time and
communication complexity. The routing information consists of a "finger
table", which stores pointers to a set of nodes at exponentially increasing
distances along the identifier circle (onto which nodes and keys are mapped by
a number of bits taken from a SHA-1 hash). The data structure that results
from combining the identifier ring and the finger tables is essentially a
distributed, circular skip list. These "fingers" are what allow the
logarithmic lookup performance -- a node that receives a query examines its
finger table to find the furthest known node not beyond the key on the
identifier ring; the process then repeats with that node's finger table
(either iteratively by the first node or recursively by forwarding to that
node) and so on until the immediate predecessor (and thus trivially the
immediate successor) of the key can be determined.

The remaining details of Chord go into the management of handling changes in
the system's membership; when this occurs the structure of the ring must be
preserved, and the finger table is updated (though the former is necessary for
correctness, while the latter is merely a performance optimization). The
necessary predecessor/successor relationships are updated by a periodic
stabilization procedure carried out by each node, in which it verifies its
position with respect to its immediate neighbors.

While it makes no explicit aim, the paper seems to present Chord as building
block for use in Internet-wide peer-to-peer style systems (the paper compares
aspects of it to Gnutella and Napster, for example). If Chord were to
actually be used in such a situation, however, its lack of resilience to
Byzantine failures (which could easily take the form of actively malicious
nodes joining the system) seems to be its biggest flaw. While the authors
acknowledged this and mentioned some legitimate countermeasures in Section 7,
I did not find these measures convincing enough for widespread deployment --
their proposal allows a node to merely *detect* that it is not being presented
a correct view of the system by its peers, not to actually do anything about
it.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Summary:

This paper presents 'chord', a distributed lookup protocol that maps a key onto a node in the network. The amount of state is logarithmically proportional to the number of nodes. The main contribution of this paper is describing a working algorithm. It's an algorithm with many flaws and security issues but it's hard to criticize something that is in use.

Flaws:

Section 4.1 → in an N-node network, each node maintains information only about O(log N) other nodes.

Problem! The small amount of state involved is a selling point for this algorithm, but if a node is behind a NAT firewall how can that firewall be traversed without either punching a hole in the firewall or remembering the identity of all N nodes? This is an example of a frustration with many distributed systems papers. They describe an algorithm that works in some world, but leave off and do not consider even common cases.

Section 4.2 → a nodes identifier is chosen by hashing the nodes IP address.

Problem! Network address translation practically guarantees that IP addresses are not unique. For instance at home, I use a AT&T dsl modem. It doles out an ip address from 192.168.0.0/16 in a linear fashion. The chances are high that there are many machines out there with the same ip address.

Problem! IP addresses are often assigned to machines via some variety of DHCP. DHCP assigns addresses as a temporary 'lease' which must be renewed periodically. If the host does not renew it's lease it can loose it's ip address and eventually be assigned a new address.

Problem! One machine may have more than one ip address, either because of multiple interfaces, or something more exotic like mobileIP. Which one should be used?

Section 4.3 → number of nodes that must be contacted to find a successor in an N-node network is O(log N ) ( theorem 2 )

This is very similar to the rumor mongering paper. The delay of propagation was expected to be O(log N ) there as well. Chord fails to respond to a problem that was expressed here as if there where two large concentrations of nodes ( say north america and europe ) connected by a thin transatlantic network link, chord would place a high burden on that link. This is a real weakness in peer to peer protocols.

Section 4.4 → We assume the new node learns the identity of a chord node by an external mechanism.

Problem. One of the selling points of chord is that it has no central point of failure. However, how can a new chord node locate any others without a central registry? What kind of algorithms exist for performing this function? The chord source available at the URL mentioned in the paper has a central well known server called a 'bootstrap' node. This centralized control ( the bootstrap node could make centralized decisions about weather or not to admit a new node to a chord network ) seems to imply that chord is not a peer to peer system by the very definition in section 1.

General issue → A single malicious chord node has the potential to cause great havoc. In a distributed system potentially spanning the Internet, there would appear to be real trust issues here. A malicious node could return incorrect results for the closest_preceding_finger call.

Possible Optimization → There was no mechanism mentioned whereby a chord node could voluntarily leave a network. Potentially the server could move all of it's keys to it's successor, and send messages to it's predecessor and successor to correct their successor/processor pointers, and the list of nearest servers. Eventually the stabilization algorithm would fix the fingers table.

Post a comment