Chord: A scalable peer-to-peer lookup service for internet applications
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM '01: Proceedings of the 2001 on Applications, technologies, architectures, and protocols for computer communications, 2001. ACM.
Reviews due Thursday, 3/25.
Comments
Summary and description:
The main problem that this paper tries to solve is locating a node in a datastore of a peer-to-peer system. In a peer-to-peer system, there is no centralized responsibility and hence locating a particular item is tougher. Though consistent hahing solves the problem it is not highly scalable. This paper tries to solve the scalability issue of the consistent hashing mechanism by dividing into smaller subsystems.
Contributions:
The main contribution of this paper is reducing the amount of information each node has to store. In case of consistent hashing, every node must know the set of data stored in every other node. But this system divides the system into smaller subsystems and each node maintaining information about one node in a subsystem.
Comments:
Since Chord solves the scalability issue of consistent hashing algorithm, which is otherwise a very good option to locate the data item ,it can be highly applicable for real systems with efficient global data lookup as the primary requirement. Proposals such as i3, DOA uses DHTs like Chord.
Posted by: Raja Ram Yadhav Ramakrishnan | March 25, 2010 04:47 PM
An important operation in P2P systems is finding out where a piece of data is stored. This paper presents Chord, a distributed way of mapping a key to the node where the data is stored. Chord does this in a scalable manner, even when there is a churn of nodes.
Chord uses consistent hashing to map every node and data to an m bit identifier using SHA1. The address of a node is the SHA1 of its IP (this would probably prevent spoofing attacks), and that of a data item is a SHA1 on its content. Assuming identifiers are arranged in a circle, a data item maps onto the node whose id is equal to, or appears next to that of the data item 's id. Each node maintains routing information about nodes 1,2,4, 8... hops away from it. This is called ts finger table. Given a key to lookup, a node forwards it to another node close to the key's id. Nodes follow a stabilization protocol to keep their finger table correct under failures. Nodes also maintain a 'successor list' containing information about r next successors in the circle. When a node 's successor fails, it will replace it with the next live node in its successor table.
The main contributions of this paper are the ring ordering of nodes based on consistent hashing & IDs based on consistent hashing, and the routing mechanism. The authors show positive theoretical results of the average case scenarios and also present a simulation of Chord. They show that Chord is indeed scalable and robust to failures.
Overall, I think it was an interesting read, and the ideas presented were pretty strong. There's one way in which the classic routing differs from Chord routing. Classic routing protocols forward packets in such a way that it gets closer to the destination, but Chord does the same thing on an abstract plane of ids instead of physical distance between locations. It makes me wonder if this is the best strategy to route. Its not clear how latency would be affected in realistic settings, and how this routing mechanism compares to others.
Posted by: Chitra | March 25, 2010 02:46 PM
Problem Description:-
---------------------
Consistent hashing previously studies requires each node to know every other node in the system, which is not possible in peer-to-peer networks, which are dynamically formed and may grow to many nodes.
Summary:-
----------
The paper presents Chord which is a distributed key lookup protocol that returns the node responsible for handling the particular key. Chord does so efficiently (requiring only O(logN) messages) with minimum space requirements at each node(O(logN) entries in the routing table).
Each node maintains a finger table. The ith finger of a node 'n' is the node that is atleast 2^(i-1) points farther on the hash circle. The first finger of a node is its immediate successor.
To lookup a key, it locates the node that is closest successor to the key in the finger table. That particular node later forwards it based on its finger table. Theoretically, it can be proven that the lookup can be completed in O(logN) hops. However, the efficiency of the lookup is affected by the freshness of the routing information at each node. The worst case will be O(N) traversing successor by successor.
JOIN protocol for individual node additions which complete in O(log^2 N). To maintain performance and availability in case of concurrent changes, stabilization protocol is designed which tries to keep successor pointers of each node as up to date as possible.
Contributions:-
---------------
1) Support of dynamic adaptability to consistent hashing method.
Relevance:-
------------
I believe some of the P2P systems in use today should be using the DHT protocol of Chord. It presents a truly distributed system in the P2P design space compared to other systems which have some centralized components(tracker in case of bittorrent).
However, in a cluster kind of scenario with limited churn of nodes and limited scale, centralized location service may be more useful from a performance standpoint.
Posted by: Laxman Visampalli | March 25, 2010 02:03 PM
This paper presents Chord, a non-hierarchical lookup service for a peer-to-peer distributed system. Chord modifies consistent hashing to make it so only Log N machines need to know about a given node.
The Chord addresses the problem of locating a node in a peer to peer system efficiently.
The contributions of this paper include:
1)A variation on Consistent Hashing that requires nodes to know the routing information of only log N other nodes.
2)Uses Finger nodes, or a list of pointers that logarithmically point to further and further nodes (the pointers repeatedly double the distance around the circle).
3) Stabilization – Chord periodically updates it routing tables by querying neighbors and figuring out what changes need to be made in their own routing tables.
4)Replication can be supported by the successor-lists
Since the ip addresses are hashed to determine the nodes location in the ring, physical locality doesn't seem to be used at all.
Also, security could be a problem as this is a peer-to-peer system and there could potentially be rogue nodes. They require that every node's successor is correctly maintained and that the node is responsible for the data it has. Replication has to be maintained by the application running on Chord.
Posted by: Sean Crosby | March 25, 2010 02:02 PM
The idea of fast lookups with consistent hashing is new, but Chord brings a bit more than just that. Nodes get to keep far less state of their surroundings, so the system can scale far better than a system that must know where everybody else is, like Dynamo (right?). Still along the lines of scalability, it provides a very simple way of adding and removing nodes from the system. Due to the loose coupling of nodes and its unique structure, nodes can efficiently join or leave.
Like is so often the case, content distribution networks seem to be the primary benefactor. Since Chord provides for replication, if the nodes are spaced about geographically, clients could be given a choice about who they go to for data. There is one primary node and some number of successors who would be geographically random. The client can choose (or a server acting on its behalf can choose) an appropriate Chord node to connect to.
It can also serve as a more responsive name service. It shouldn't be hard to implement security on top of Chord, and it would be far more consistent than DNS. It would also be a large improvement over crucial services like DNS because it is decentralized. There are no root servers, and you cannot target any portion of the name space in any attack. Perhaps at worst you might be able to compromise one of the servers and DoS the system. More generally, any system that has backup servers that run simultaneously could benefit from Chord, provided having at least ten nodes or so is of some benefit.
Posted by: Markus Peloquin | March 25, 2010 01:55 PM
Problem:
Limited scalability of consistent hashing under the assumption that nodes are “aware” of other nodes in the system when using consistent hashing for a distributed key value store. Chord is an attempt to achieve scalability with communication cost and the state maintained by each node scaling logarithmically with the number of chord nodes.
Chord is a P2P distributed system where nodes collectively maintain a key value store. It uses a variant of consistent hashing and hence can adapt easily to a continuously changing environment where nodes may leave and join the system. This models mobile clients and Chord can hence be used to maintain key values as mobile clients come and go. Chord suits loosely organized peer to peer applications well. Chord does not restrict the application from following any authentication, caching replication and user friendly naming of data.
It uses consistent hashing to distribute the load among nodes and it can retrieve value for a given key by looking up a consistent hash ring. The variant is – it includes additional information at each node called finger table. Finger table contains routing information about (log N) nodes. The paper proves that number of messages for maintaining the finger tables at each node in a continuously changing environment is o(log^2 N) .
Contributions
1. Figuring out the additional information needed to maintain at each node to scale consistent hashing approach
2. Finger tables – Maintaining sufficient routing information about predecessor and successor in a continuously changing network configuration.
3. Finding Cost interms of messages for maintaining routing information in a continuously changing environment
Applicability
The paper mentions a few interesting application where chord service can be employed. Distributed indexs and cooperative mirroring are common applications. Time shared storage to maintain data during down times of some other node is interesting. Also Chord can be used to identify a machine for a part of the work in a large combinatorial search.
Posted by: ramya@cs.wisc.edu | March 25, 2010 01:55 PM
The paper presents Chord, a distributed storage system for access to a partitioned key-value store. Like other methods lsuch as consistent-hashing, Chord also handles dynamically changing set of nodes participating in the distributed storage system.
The problem addressed by the paper is one of storing a distributed hash table in a scalable and consistent way. While consistent hashing aims to minimize lookup times by requiring global state to be known either by the client or each of the servers that can increase number of messages and reduce scalability, Chord selects an approach where each request is guranteed to find the data within O(log(n)) hops over wrong servers but requires that each server know of only log(n) other servers.
Some of the attributes of Chord are that when a server becomes 'hot', manual intervention is probably needed to add more servers to share the load. But being aimed at ad-hoc P2P system with very high churn, it is far more practical to use a system that requires a minimal amount of information about the global set of servers. log(n) is an acceptable number.
While the system is has been proven in many popular P2P systems, it is difficult to guess the affects of heterogeneous bandwidth resources with each node on the system since 'hot' can mean different things to different nodes.
Posted by: Sanjay Bharadwaj | March 25, 2010 01:54 PM
Summary:
Chord is a distributed hash table algorithm that uses consistent hashing. It’s biggest distinguishing feature is that it maintains routing information for only a small number of other nodes, making lookups and scaling with adding/losing nodes very efficient.
Problem:
Chord looks to solve the problem of easily scaling a distributed hash table. Earlier distributed consistent hashing algorithms required that each node maintain a list of all other active nodes on the identifier circle in order to route requests. Maintaining a list of all the nodes makes lookup very quick (O(1)), updating the list across all nodes every time a single node joins or leaves the system quickly becomes quite costly as the number of nodes increases.
Contributions:
Chord has all of the benefits of the consistent hashing algorithms it is built upon, including availability and the flexibility of nodes entering and leaving the system arbitrarily. Additionally, Chord attempts to solve the scalability issue by only having the nodes store a small amount of routing information. At minimum, a node needs to store information about it’s successor on the identifier circle. While this allows the system to function correctly, the lookups would be quite slow. As an optimization, each node maintains a finger table. The finger table is set up so that the ith entry corresponds to the succeeding node with a distance of 2^(i-1) on the identifier circle. Given this table, looking up a node can be accomplished with O(log(N)) lookups. The finger tables must be updated every time a node is added or removed from the system. The reassigning of keys after such an event is left up to the application. Additional considerations must be made for concurrent joins and for node failures.
Response:
The paper mentions that load balancing is one of the problems solved by Chord and each node maintains information on K/N keys (where K is the number of keys and N is the number of nodes). However, it does not take into account the fact that the values associated with a particular key may vary widely in size and popularity. The number of keys is balanced across the system, but the amount of traffic and storage is not necessarily.
Also, it seems like a centralized list of nodes stored client side leads to a much simpler algorithm (with O(1) lookups to boot). While a centralized list of nodes means there is a central point of failure, maintaing a replica is not difficult. Updates to the list would be much simpler given node failures and joins, only requiring a few messages instead of the O(log^2 N) messages required by Chord. It seems like the four possible uses of Chord mentioned in the paper could just as easily use a centralized node list.
Posted by: Ryan Johnson | March 25, 2010 01:51 PM
Chord provides a distributed lookup service in peer-to-peer systems. The key problem in peer-to-peer systems is that of mapping a key to a particular node. Scalability is an issue as the lookup is proportional to the number of nodes in the system. The system also needs to work well with nodes dynamically joining and leaving the network. Besides this, the lookup also needs to be fast.
Chord is a distributed lookup service that satisfies all of the above requirements. It does not put any restrictions on the key space. It uses a consistent hashing mechanism to map keys to nodes. This is inherently load balanced as the keys are distributed to each node in roughly equal proportions. It also ensures reduced key movement when nodes come up and go down in the system. The key contribution of chord would be the scalability offered by the system. This is done by reducing the amount of state stored in each node. Naive methods require each node to be aware of the entire system and require a large number of hops / messages before the destination node is identified. Chord requires a node to only be aware of a small subset of nodes and guarantees node lookup in O(log N) hops. The way it achieves this is by maintaining small "finger tables". To keep in sync with nodes being entering and leaving the system, each node runs a stabilization module periodically which maintains the finger tables in a consistent state.
Key-value stores can be built on top of the chord lookup system. The authors have also built CFS (co-operative file system), a peer-to-peer read-only storage system, that uses chord as the underlying block location protocol. Naming methods in networking such as DOA, I3, which use a flat namespace can benefit from the chord lookup protocol.
Posted by: Deepak Ramamurthi | March 25, 2010 01:42 PM
Summary:
Chord is a distributed hash table, which uses the concepts of binary search to reduce the state stored at each node, enabling us to find the node for a given key is O(logN) hops instead of the usual O(N) hops. It is scalable to a large number of nodes, and works in the face of nodes going down and coming back up in the system. The correctness and hop guarantees of the system are theoretically proved and experimentally verified.
Problem description:
Is it possible to have a distributed hash table that works even as nodes come and go in the system? Will such a hash table scale? What is the minimum number of network hops required to find a key in such a system?
Contributions:
Chord's major contribution is reducing the amount of state required at each node, and also finding the key in O(logN) hops. It does both these things by a neat trick - It divides the system into subsystems of increasing powers of 2, and 1 node in each subsystem is all that is required for finding keys. Of course, lists of other possible nodes are kept so that in the event of a failure, each subsystem still has a node.
I am not sure about this, but if they were the first guys to view the distributed hash table as a possible peer to peer system, then that is an interesting innovation in its own right.
Applicability to real systems:
I think Chord would be very useful in any system where distributed lookup is required - I would guess that this is every distributed system. Given that Chord reduces both state and number of hops, why was it not as successful as the authors intended? The usual inertia-no adoption of anything new-thing?
Posted by: Vijay Chidambaram Velayudhan Pillai | March 25, 2010 01:37 PM
This paper presents Chord, a distributed look up protocol that addresses the problem of where to find a particular piece of data in a system. Chord is scalable in a dynamic system and uses a modified version of consistent hashing but do not require that all nodes be aware of all other nodes in the system and thus can cut down on the data a node is required to store and how many messages must be sent to locate data. Chord is also decentralized and it is provably correct and provably efficient. Chord differs from many other key based systems because a key is usually used to directly access a value but in Chord the key tells you which node a value is stored on.
Chord tries to provide load balancing but using a distributed hash function and spreading the keys over many nodes. Chord is decentralized because there is no "leader" node. This makes it desirable for loosely-organized applications. Chord is scalable because of the cost of look-up grows as the log of the number of nodes. Chord provides availability by adjusting it's internal state to compensate for nodes that have failed and also to add new nodes to the network. Chord also provides flexible naming because there are no constraints on the values of the key.
I really like the notion of a completely decentralized distributed system. I think it comes closer to achieving what a distributed system should actually be. If there is a "leader" node, it limits the size of the system because network traffic can become too much. It also introduces a single point of failure. And of course the leader can be replicated but having a truly decentralized system is the way to go. One problem (as the authors state) with such a system is that it is more difficult to locate where a particular piece of information is stored because no one knows where everything is. Chord's implementation appears to provide an efficient, not overly burdening method of finding where data is in the system. I think this is obviously useful because without the ability to locate data efficiently in the system, the system is somewhat useless.
Posted by: Elizabeth Soechting | March 25, 2010 01:34 PM
Summary:
This paper describes Chord, a distributed protocol that efficiently locates the node that stores a particular data item for a key in peer-to-peer applications. Analysis and experimental results shows that the scalability of Chord is good, with a log(N) scaling complexity with the number of nodes N.
Problem Description:
System model: a peer-to-peer system with frequent node arrivals and departures. Chord support only one operation: given a key, it maps the key onto a node.
Chord is designed to address the following problems:
-load balance, to distribute keys evenly among the nodes.
-Decentralization, a fully distributed protocol.
-Scalability
-Availability
-Flexible naming, no constraints on key structures.
Goal: simplicity, correctness, and performance.
Contributions:
Keys are assigned to nodes with consistent hashing. Consistent hashing can spread keys among node and achieve load balance. It is also very feasible in a system with frequent node coming and going.
The look up algorithm looks for data in the successor node and uses finger tables. At node joining and departure/failures, Chord will run Stabilization protocol, updating the successor pointers and finger table. Stabilization protocol is also run periodically. Correct of successors is very important to lookups.
The paper proved that the algorithms are efficient and scalable: O(log(N)) messages per lookup, O(log(N)) time for look up, and O(log(N)) status per each node. Routing information updating passes O(log^2(N)) messages when nodes join/leave the system.
Applications:
Chord provide a useful and efficient primitive, which can determine the node responsible for storing for a given key. It is easy to build large scale peer-to-peer systems on it.
Posted by: chong | March 25, 2010 12:56 PM
Description:
This paper describes Chord, a scalable distributed lookup protocol that is capable of efficiently locating a node which stores particular data. Chord builds a distributed hash table based on consistent hashing and stores only O(log(n)) routing data on each node. The way Chord works is similar to linked list with multiple layer of forward jumping entries (skip list), and is guaranteed with large probability to finish the routing within O(log(n)).
Problem Description:
The problem this paper tries to solve is to build a scalable distributed hash table which could locate a key rapidly. Although consistent hashing has been proposed before this paper, and is capable of load balancing in face of nodes joining or leaving (failure), every node needs to know the virtual position of all other nodes. Therefore, it is susceptible to scalability. Chord only needs to store O(log n) data on each node while still guarantee to locate a particular node with large probability in O(log n) time. Therefore, this work is unique and important.
Contributions:
First, Chord found a perfect balance for the amount of data storage versus the time needed to locate the data. Ideally speaking, these two factors consist of a tradeoff: linked list and consistent hashing are two extremes. On one hand, we can locate the data in O(1) time while storing all the information on each node (storing O(n) info), and this is consistent hashing. On the other hand, we can store only the successor node (storing O(1) info) and locating desired node by traversing the list (O(n) in the worst situation), and this is the linked list. Chord makes either of the two factor to be O(log(n)), hence acceptable in both scalability and speed to locate a node.
Second, Chord discovered a practical and easy way to achieve both correctness and performance. Actually for the purpose of correctness, any node only needs to be guaranteed to link correctly with its predecessor and successor (exactly like in a linked list) and this work is easy to implement. For the purpose of performance, the finger table of itself and other necessary nodes should be updated, and this is somewhat difficult to complete in an atomic operation. Chord enforces correctness whenever a node joins or leaves, and does the “stabilization” process periodically. In this way both the correctness and performance could be achieved.
Real Applications/Problems:
Since the purpose of Chord is scalability, it has actually been applied in several large distributed systems. If I remember correctly, many P2P softwares implement Chord at least in some part, like the Kad service in Emule. The successful application in these P2P softwares somehow reflects the success of the system. In addition, Chord could also be used to build a distributed DNS system, since DNS is also a key query from domain name to IP address.
Nevertheless, I still have one question about Chord. Chord considers time and storage, but does not consider bandwidth (or latency), that is, it assumes a homogeneous network. Is it possible to optimize the locating queries based on heterogeneous network?
Posted by: Shengqi Zhu | March 25, 2010 12:19 PM
Summary:
Chord presents a distributed store for keys usable by a P2P network. The keys can be named in any way, as it is a flat key space. The values can similarly represent any necessary piece of data for the application. A special consistent hash is used to provide scalability and efficiency.
Contributions:
Mainly this paper takes ideas from popular distributed systems and applies them to P2P networks. The consistent hashing is a tricky thing, so replication and adding and removing nodes to/from the ring is important. Another point is that scalability and decentralization are crucial for P2P applications. After identifying the main concerns, the paper does a good job of measuring the performance and necessary load on each machine. As a result, further works can be added which utilize these ideas.
Comment:
I am personally a fan of P2P applications, the idea is just appealing to me. This method allows for P2P systems to use the distributed keys provided by Chord as a useful part of the system as a whole. With Bit torrent on the rise, and DHT being used increasingly, the contributions of Chord are really starting to take hold on the Internet.
Posted by: Jordan Walker | March 25, 2010 11:22 AM
Summary:
This paper presents a system for maintaining key-value pairs in a distributed environment. This system has numerous applications and can act as a look-up service for a wide variety of services.
Chord is a distributes look-up service guarented to be scalable ,efficient and fault tolerant. All the entities in the system including the nodes and the keys they store are identified by a hash. each node stores the mappings for a certain contiguous series of these hashes. Scalability is achevied by each node maintaining a limited amount of information and using a efficient routing mechanism to route a request to its owner. This architecture acheives load balancing while also minimisig the fluctuations caused by nodes joining and leaving the system as only a part of the hashes are redistributed.
Contributions:
1. The routing state maintained at the node. It acheives a balance between the follow the ring and the mainaining all the node's information.
2. Scalability by each node maintaining only certain amount of lookup and routing information
3. Flat namespace gving great amount of flexibility
Applications:
Chord has been used in the designing of networking systems that delink a resource from the host. Examples are i3,DoA etc for its efficient routing algorithms.
Posted by: Neel Kamal M | March 25, 2010 10:42 AM
Goal:
Chord provides a distributed hashing table service that map key to address. All nodes are equally responsible for maintaining the data. The system is based on consistent hashing but also use finger table to limit the number of node that each node has to keep track of
Problem:
Locating objects in P2P network requires some kind of distributed directory service. However, P2P tends to be very dynamic, so it is hard to keep maintain this directory efficiently.
Contribution:
By using consistent hashing as the basis of the protocol, Chord is able to handle load balancing problem and the effect of nodes joining/leaving the system. The main contribution comes from the finger table which serves as shortcut of locating the node on the other part of the hash circle. This also allow Chord to limit the number of information that each nodes needs to maintain.
Other P2P applications achieve scalability by using hierarchical P2P networks and designate certain nodes to become “super nodes”. This is done in order to limit the number of nodes that are required to contact during query. However, Chord provides adopt a different mechanism that achieve similar requirement but without requiring some nodes to have more responsibility than others.
Applications:
Chord provides naming resolution which is considered as a core functionality of any P2P applications. Program such as instant messaging can use this protocol to maintain user online status instead of using centralized server.
Comments:
As the paper already point out that there is a bunch of feature list that is crucial for building P2P application. One of the major concerns is that Chord protocol might not be extensible to support other features such as selection of local peers or authentication. This means that application designer have to rely on variety of protocols into have full set of functionality which may increase the overhead of application.
Posted by: Thawan Kooburat | March 25, 2010 09:58 AM
Problem
The paper presents a key-value store that has flat namespace and can be distributed over large number of nodes. The paper also deals with dynamically joining and leaving of participant nodes.
Summary
The paper presents Chord, key-value store that is based on consistent hashing. In an N-node network, a lookup for key requires O(log N) messages. Joining and leaving of node only affects to a small set of hosts, requiring only O(log^2 N) messages. Each node stores only fraction of continuous regions. When some of nodes get ‘hot’ which means many lookup is performed on specific regions of identifiers, administers can put another node into nearby of ‘hot’ region so the ‘hot’ region can be split into existing node and newer node. One caveat is that client cannot find a node that has specific key but it requires few routing to locate desired node.
Contribution
One of the benefits is that it provides flat-namespace while providing scalability. Thus, one does not need to care about any hierarchical organization of data. In hierarchical organization, scalability is determined by the way it is organized in hierarchy. Naturally, it cannot avoid skewing; in terms of organizing, in terms of access pattern. In Chord, it is very easy to solve skewing. Just put another node into ‘skewed’ region would suffice.
Applicability
Since it does not require hierarchy of data, Chord is referred by many of Internet naming papers such as DOA (delegation oriented architecture) and i3 (Internet Indirection Infrastructure). All of these are based on flat-namespace thus without Chord they would not exist. Amazon Dynamo extended this idea to fit large Internet services.
Posted by: MinJae Hwang | March 25, 2010 09:21 AM
Problem :-
To design a peer to peer based distributed key-value lookup service that can used as a building block for large scale distributed services.
Summary :-
The paper presents Chord, a large scale decentralised key-value lookup service that builds upon the Consistent hashing proposal to build a system that is more scalable by storing a limited amount of state at each node. Its simplicity of design and high performance makes it an ideal candidate for building a system for the specific task of global key-value lookup. It provides load balancing by distributing keys amongst different nodes, tolerance to node joins/departures in a decentralised manner.
Contribution :-
Chord is highly scalable system that provides an efficient key-value resolution system based on a flat namespace. This is very important as it frees the key from any naming semantics while still providing an efficient lookup service. The finger table allows each node to maintain a small amount of routing state even if the system to a very large number of nodes. It provides a algorithms for efficient routing of key requests to nodes, node joins/departures and maintains a set of invariants to stablise the system from churn created by node joins and departures.
Applicability to real systems :-
Chord can used to design sytems meant for providing global lookup services. In networking, there have been proposals such as LNA (Layered Naming Architecture), DOA (Delegation Oriented Architecture), i3 (Intenet Indirection Infrastructure) that use DHTs such as Chord to design key-value lookup mechanisms for name resolution and packet forwarding. Chord also provides tolerance to failures and transparent remapping of keys to nodes, which is important for such services.
Posted by: Ashish Patro | March 25, 2010 08:37 AM
Summary and description:
The paper describes a distributed lookup mechanism that can map keys to a dynamically changing set of nodes. The method is designed for large scale peer to peer systems, and has a high degree of scalability, efficiency and fault tolerance. Effectively, in a internet-scale system in which data items are distributed among the nodes, the method provides a solution for locating a data item.
Contributions:
The paper modifies consistent hashing so that it can be efficiently used when the set of nodes is both large and is dynamically changing. Although this is an important contribution, I believe that the main contribution of the paper is rather proving that, with a very high degree of probability the method incurs little overhead for lookups and churn. Specifically, the idea that routing can take place in O(log N) time by maintaining O(log N) routing entries in each node is an interesting contribution of the paper.
Comments:
Chord has been a very influential paper, and it's real world applicability is doubtless (not many research ideas have wikipedia pages). What I liked about the paper: The authors have done a lot of simulation, to find the effectiveness of the system under various parameters. They have also tried to proved theoretically that the system would incur very little overhead with a very high probability.
Posted by: Thanumalayan Sankaranarayana Pillai | March 25, 2010 08:25 AM
Summary:
The authors introduce their design principle and implementation of Chord, a distributed peer-to-peer protocol, which could lookup mapped nodes with a specific key and shows great scalability.
Problem:
Chord tries to address the problem that how to efficiently find a node for a specific key, considering environment changes dynamically and the nodes consecutively join and leaves the system.
Contributions:
Chord uses consistent hashing to allocate keys to nodes. The hash function is used to achieve load-balance among all the nodes. The system could also add or delete keys to reflect the leaving and join of nodes.
Each node in Chord just maintains the information for a small number of nodes (successor and predecessor). The method could reduce much workload when do the lookup. A finger table is also used to enhance the search performance.
Questions/Comments:
1. The load-balance method in Chord assumes all the nodes are homogeneous. Therefore, it may not work well in all P2P environments.
2. I’m not very sure whether it’s a good idea to reduce the routing information stored at each node while increase many communication messages. I feel the network resource should be more precious than storage in P2P environments.
Posted by: Ao Ma | March 25, 2010 08:00 AM
Summary:
Consistent hashing is a good way for mapping the keys to nodes in an effective way. It is easy to identify ownership and load-balance with this technique. But as number of nodes in the system increase the amount of information stored at each node becomes hard to manage, especially with dynamic membership. Given that servers just need to know the successor information, that paper suggests that the amount of information stored on each node can be reduced and with little communication right owner of a key can be easily identified. Each server stores a finger table and successor list of r-servers in case immediate successor fails. Even in case of multiple failures, reachability of working systems is guaranteed using the stabilization algorithm. It is also proved that any changes to membership detection can result in no more that O(log^2N) messages.
Contributions:
Providing a scalable decentralized framework for identifying the server responsible for a given key.
Identifying upper bound on number of messages required for identifying owner and membership detection.
Applicability:
Lots of things like replicating data, authentication have been left out to user of the system, but it provides very good basic framework for large scale systems to rely on. I have a question though – with presence of virtual nodes, is it possible that messages to identify right server cross same node twice?For example, if nodes are in order - 1a, 2a, 3a, 1b, 4a. To reach 4a, we are crossing 1 twice. In this case, probably it is efficient to look up all virtual nodes for physical node and pick successor that is closest to key's hash value. Also, the system does not have good solution for tolerating partitions.
Posted by: Satish Kotha | March 25, 2010 07:53 AM
Summary
The authors develop a specialized peer-to-peer service to lookup the node mapped by a key in a scalable manner. The service maintains the necessary lookup and routing information, and during joins or failures, fixes up the routing information.
Problem
Peer-to-peer services provide a variety of services, but all ultimately rely on mapping a piece of data to a node. The Chord protocol aims to provide this service in a scalable and efficient manner (in terms of routing storage and lookup time) for a higher level peer-to-peer program.
Contributions
I’m not sure if the authors came up with the idea, but a key design of Chord is the that a node stores partial information about the entire set of nodes (in this case, ~log(N) information) to allow it to efficiently find any given key (in ~log(N) hops). A practical contribution is the Chord prototype which allows a peer-to-peer service to be built on top of the lookup service.
Applicability
I think the idea is somewhat useful to not require a node to know about all others in a peer-to-peer system. Logarithmic scaling is a good property (in general) to have for a distributed system. However, a lot of focus appears to be on storing a small amount of routing information per node. Is this necessary? With storage being effectively free while messages being costly, wouldn’t it be a better idea to store more routing information and perform fewer network messages? The tables could be updated more lazily since there is less a chance for all the information to be invalid.
Posted by: Jesse Benson | March 25, 2010 06:16 AM
Chord is a distributed lookup protocol for assigning keys to nodes in a fault-tolerant, distributed, and balanced manner. The paper limits itself to discussing the method of assigning keys to nodes and retrieving them.
The problem that the paper discusses is that if you want to have a keystore on a group of nodes, where nodes can join and leave the network at any time, and without any central authority, it is necessary for every node to be able to somehow find the node storing a particular key. When a node joins the system, it is given the address of at least one other node, but after that must determine what information to store and make itself known to the other nodes. Another aspect of the problem is that for a large system, maintaining information about all of the nodes is impractical, especially when they are constantly changing. It is also desirable for the number of hops to locate the storing node to not grow too quickly.
The paper contributes an efficient way to store information to allow the nodes in a distributed system of nodes to find a specified key stored on a certain node in a limited number of hops. Both the storage and the number of hops is O(log n). The paper accomplishes this using consistent hashing to assign keys to nodes. To locate the target node, each node stores a finger table for nodes at exponentially increasing clockwise distance from itself. To locate a node, it talks to the closest node it knows about, which in turn asks the closest it knows about. In cases where the finger table is reasonably accurate, this leads to a complexity of O(log n).
The techniques in the paper could be useful for a number of things, although the paper does not discuss them since it focused only on distributing and retrieving the keys. Values could be stored with the keys, allowing it to be used for distributed file storage, for example. Replication could be added (with writing to multiple nodes, for example) to attain a degree of fault-tolerance and prevent data from being lost. There would probably be some security implications to some of the ideas; for example, a malicious entity could try to gather large quantities of the data by creating many nodes, or if the system was used as a data store, they could try to read the data being stored on their machine (or destroy it, or not update their successors or finger table correctly). For some applications this would not matter, but for others it would be important.
Posted by: Lena Olson | March 25, 2010 05:46 AM
Summary:
This paper describes a distributed lookup protocol that can efficiently locate the node that stores a particular data item in peer-to-peer application. Given that the node can dynamically join and leave, Chord develops a variant of consistent hashing which ensures the simplicity, provable correctness, and provable performance of P2P lookup protocols.
Problem:
Chord aims to address only one problem: given a key, it maps the key onto a node given that the node constantly join or leave the system.
Contribution:
- Load Balance: In some extent, Chord plays the role of dynamic consistent hashing which spread keys evenly over the nodes. Through distributing routing table among machines, Chord node needs “routing”information about only a few other nodes. This feature greatly improves the scalability of the storage system because when a new node joins in, the times of lookup increase logarithmically. More specifically, the joining or removing of the Nth node only cause the 1/N keys being redistributed
- Decentralization: No node is more important than any other.
- Availability: Chord automatically adjusts its internal tables to reflect newly joined nodes as well as node failures, ensuring that, the node responsible for a key can always be found.
- Flexible Naming: Chord places no constraints on the structure of the keys it looks up
Applicability:
Behind the design of Chord there is an assumption that the nodes are homogeneous, such that evenly distributing key among them can achieve a good load balance. Since heterogeneity seems to be the trend of modern distributed system, I don’t think the idea in Chord will be applied widespread.
Questions:
What is the cost or flaws of decentralization? Why there are so many production system use centralized architecture?
Posted by: Deng Liu | March 25, 2010 05:16 AM
Concisely speaking Chord is a peer to peer lookup system. It implements the promises made by P2P philosophy like reliability, high capacity through parallelism, automatic configuration.
Chord more precisely addresses the issue of efficiently locating the node that stores a particular data item by providing a distributed lookup protocol. The design team believed that if the issue of distributed lookup is resolved then the major implementation motives like scalability, reliability and others of a distributed paradigm can be solved.
Chord protocol works on the assumption that underlying network is both symmetric as well as transitive. Consistent hashing is used to assign keys to the nodes. Hash function acts as a load balancer in the system. When nth node joins or leaves then only O(1/N) fraction of keys are moved. Consistent hashing function assigns each node and key an m bit identifier using SHA-1 base hash functions. IP address of node is hashed. Successors and identifiers are obtained as identifiers are ordered on an identifier circle better called as chord ring. The system addresses load balancing by distributed hash function by spreading keys evenly over the nodes. The system is full distributed and scalability is implemented by growing lookup as logarithmic function of number of nodes. Internal tables are automatically adjusted to reflect changes to address availability. There is flexible naming as key structures have no limiting constraints.
The system successfully provide mechanism for dynamic operations and fault tolerance issues as it deals with nodes join & stabilization, replication on failures and voluntary node departures. The stabilization protocol in Chord guarantees to add nodes in a fashion to preserve reachability.
The system might have some problems in distributed file sharing as it may require modifications for additional mapping from DHT keys to keywords because Chord’s lookup is only exact key lookup. Distributed load balancing may have its own limitations in Chord as web cache data is usually non uniform in terms of size.
Chord successfully solves the problem of locating a data item in scenario of frequent node arrivals and departures. The system provides simplicity, provable correctness and performance. Chord can be a better substitute of DNS’s functionality as it maps key-value pairs, its independence from special servers, no implications of naming structures and can find data objects as well as hostnames and services. Chord’s lookup mechanism operates in predictable time span and is certain to yield success or definitive failure. Though system faces latency with rise in total number of nodes but the system is importantly robust when there are multiple node failures.
Posted by: y,kv | March 25, 2010 04:59 AM
Cord is a p2p lookup service for internet applications. The paper does thorough job putting the system through its paces with tests, comparisons to existing systems, and explanations of the protocol through pseudo-code.
We all know how hard it is to get that latest movie or video game from existing p2p systems. We start up are favorite client and expect it to run for a couple of days before it compiles and delivers are ill gotten goods. One factor slowing the process down is the difficult inherent in efficiently locating nodes in a large every changing system, if the files are found at all.
Chord wants to improve on the performance of finding nodes with particular data items by using consistent hashing. As an augmentation each node is only responsible for sub set of routing nodes. So you can no longer contact the node you desire directly. It takes a couple of jumps along the circle to arrive at a node with the desired data. This concedes more network traffic for better scalability. Which is fine in my mind because the system produces correct responses while in a continuous state of change.
An optimization for p2p systems has value in the real world. It's fun to imagine other ways to utilize p2p systems to accomplish meaningful tasks, but hard to ignore their most pervasive use. So in that sense, bring on an update to utorrent.
Posted by: Jeff Roller | March 25, 2010 04:50 AM
This paper describes Chord, which is a scalable peer-to-peer lookup system. Chord maps each given key to a node and uses consistent hashing to assign keys to nodes. It solves problem of locating data items in a large collection of nodes in a distributed system. It also maintains routing information as nodes join and leave the system.
For the peer-to-peer systems, data are spreading and the efficiency of finding locations of data items are very important. Chord is required to consider much about load balance, decentralization, scalability, availability and flexible naming. Every key gets a node taking responsibility for it. Chord designs to make node storing the data items which users are searching highly available and fast to look up.
Chord uses distributed hash function which works to spread keys evenly over nodes. It has logarithmic growth of lookup costs with number of nodes in network which is good enough for Chord system to scale to large systems. The nodes in Chord are designed to have the same importance and are fully distributed. Chord keeps watching on the join and leave of the nodes and adjusts its internal tables to ensure that the node related to a key is always available. In the steady state, each node maintains routing information for O (log N ) nodes, and send O (log N ) messages to other nodes for look ups as total node number is N. Chord sends O (log2N ) messages for nodes leaving and joining. It scales well and return answer to lookups efficiently.
The design of Chord focuses on scalability and load balance for public and sharing data and relax the validation of the identity of the uploaders and downloaders of data. Peer-to-peer system is used for sharing and doesn’t provide many guarantees. So it works pretty well for entertainment like video sharing. This application handles large amount of data but don’t require high-level protection of the transferring data.
Posted by: Lizhu | March 25, 2010 04:34 AM
Summary:
This paper describes the Chord peer-to-peer distributed loop-up service. The system achieves a steady-state performance of O(log N) messages per query, with a space requirement of also only O(log N) per node. The paper describes the basic Chord protocol, followed by a modified protocol that considers concurrent operations and failures. The evaluation section evaluates the scalability, fault tolerance, and performance properties of the system. Their mathematical formulations from earlier sections are validated by experiment.
Problem description:
The paper does well at placing the work in context. The problem of implementing a distributed hash table is a simple problem that has obvious applications. A variety of prior approaches have been explored, including DNS, Freenet, PAST, and CAN, of which CAN is the most similar to Chord. However, in contrast to other work, Chord allows the system to scale as the number of nodes changes, using a fully peer-to-peer protocol, and giving provable bounds on correctness and performance.
Contributions:
The protocols described in the paper are simple, provably correct, and have provably good performance bounds (although I admittedly could not understand some of the optimizations that were described). Basically Chord is consistent hashing extended to use a binary search for scalability, and with some supporting fluff. The strength of the paper is in the strong positioning of the work, the mathematical and statistical rigor, and their success in validating their theorems.
Applicability:
Distributed hashing seems like a very relevant and timely problem. Whether a peer-to-peer environment is a realistic implementation scenario is an open question. A hierarchical structure can give the same performance guarantees with a simpler protocol implementation and better consistency guarantees, but it relies on the feasibility of a more static communication topology. In more dynamic cases, a peer-to-peer system provides more flexibility, but how many systems are necessarily peer-to-peer on a large scale? Not many.
Posted by: Marc de Kruijf | March 25, 2010 04:01 AM
This paper discusses about the lookup service in peer-to-peer system as handled by the Chord.
The main problem or feature of the peer-to-peer system is to locate the node hosting the required data. The challenge in providing this service is due to the dynamic nature of the system where in nodes can join or leave the system at any time. Also, it is important to identify the destination node with minimal number of hops possible. This would provide a scalable system allowing the system to grow to large number of nodes.
The intuition used to reduce the search space is by using binary search method rather than linear search form. From a node's perspective, the entire system of nodes is divided into subsystems of size in increasing powers of 2. Each node maintains info about a node in each of the subsystem. If the hash of the key falls under a subsystem, then the request is forwarded to a node in that particular subsystem (The request is however forwarded to a closest predecessor in the original system). This provides the time for query service to be O(log N) rather than O(N) in the linear search method.
Consistensy hashing is employed to distribute the nodes and key on the system space. This method has implicit characteristics like load balancing, non skewed distribution of keys even in the presence of failures or new nodes joining. The system kind of follows a reconciliation protocol (called stabilization) to maintain the consistency of the fingers table. Dynamo's write protocol writes to majority of nodes following the actual co-ordinator of the key. This idea is looks similar to the successor list maintained to overcome the failure case in Chord system.
The paper explains clearly the behaviour of the system under many scenarios with correctness proof and also the execution time analysis. Also, the system has been designed as a generic framework and so can be used by many different applications can run over them.
Posted by: Sankaralingam Panneerselvam | March 25, 2010 03:57 AM
Summary:
This paper describes the theory and implementation of Chord, a distributed peer-to-peer protocol designed to implement one of the fundamental operations common to most peer-to-peer applications.
Problem Description:
Chord attempts to tackle one common component of peer-to-peer applications. Specifically, given a key, we would like to locate the node responsible for said key efficiently. We would like the lookup process to be fast, but we would also like to keep the amount of state stored at each node to a minimum.
Contributions:
Chord improves of previous work by focusing on a narrow but important component of peer-to-peer networks. By keeping a narrow focus, Chord is able to avoid most higher and lower level implementation details and provide formal guarantees about Chord's performance in the presence of failures. The Chord paper also provides empirical results supporting the formal performance guarantees laid out along with the algorithm.
I feel that the most import insight in this paper is that reducing the amount of routing state stored at each node make maintaining consistency easier. This is a direct result of each piece of state being stored in fewer places. This benefit is in addition to the obvious scalability benefits for networks with routing information that exceeds the resources of a single node.
Questions/Comments:
I am surprised that the author's Chord implementation uses an iterative lookup approach. It seems to me that a recursive approach has the potential to reduce the number and length of round trips in the common case.
Section 6.4 seems to contain an inconsistent statement: "For example, if the Chord network had partitioned in two equal-sized halves, we expect one-half of the requests to fail because the querier and targets would be in different partitions half the time. Our results do not show this" I don't see how a network can be both partitioned and not-partitioned at the same time.
Posted by: Joel Scherpelz | March 25, 2010 03:28 AM
CHORD
-Summary:
This paper describes Chord, an efficient consistent-hashing-based lookup service in dynamic p2p environment where nodes make frequent join and departure.
- Problem:
Chord addresses the problem of locating a node for given key in distributed environment. There are a lot of challenges for this problem such as how to maintain good balancing, provide availability and scalability in the present of failures, and so on.
- Contribution:
I think the main contribution of this paper is the Chord protocol that specify how to find a node for a particular key, given the fact that nodes can join and leave the system dynamically.
Based on consistent hashing idea, Chord has some optimization to make thing work better. Each node in the system stores information about only a small number of other nodes. Specifically, each node knows about its successor and maintain a pointer to its predecessor. It also maintain a finger table to speed up the search. Using this table, the look up does not have to go around the circle linearly. Also, with the help of this finger table, the number of nodes must be contacted to find a key is bounded by O(logN). To deal with node fails, each Chord node maintain a successor-list, so that find_predecessor can work in the worst case.
- Comment
I think this is the nice paper overall. However, it seems like Chord does not deal with network partition. Also, I am not sure what consistency model that is provided by Chord?
Posted by: Thanh Do | March 25, 2010 02:53 AM
Chord:
Summary:
This paper introduces Chord: a scalable peer-to-peer service for internet applications. The basic premise is that there is a set of data items distributed across several peers. The challenge is to map a particular data item to the corresponding peer in the presence of peers leaving and joining the network. Chord uses a variant on consistent hashing to achieve this goal. The first-order contribution of this paper is removing the requirement that every node must know about every over node. This results in a load balanced, decentralized, scalable and available system.
Problem:
The problem is that storing and maintaining routing from every peer to every other peer is tedious and costly. At the cost of extra few hops, the routing information stored and maintained at every peer can be significantly reduced. The goal is to provably guarantee performance and correctness.
Contributions:
The most significant contribution of this paper is a relaxed version of consistent hashing that can be used to provably guarantee performance and correctness. The basic idea is that instead of storing routing information for all other peers, each peer stores a "next" pointer and a subset of routing information called finger table. It may require an extra few hops to reach destination, but the number of hops can be made small. To simplify node joins, each peer also stores a link to the previous node. So, basically this becomes a doubly linked list. Node joins affect finger tables and may require a number of messages to stabilization all the tables. However, stabilization can be eventual rather than strict, so that instead of guaranteeing that all data will be always found, there is guarantee that all data is found eventually. In order to survive failuers, each node maintains a successor list of its r nearest neighbors, so that it knows who could be next in case the current next node fails.
Comments:
Overall I found this to be an interesting application of relaxed consistency and relaxed consistent hashing. I don't think that this idea is particularly clever, but it's attractive in that latency (number of hops) can be traded off for maintaining state flexibly.
Posted by: Polina | March 24, 2010 09:36 PM