CS 739 Review Blog: Dynamo: Amazon's Highly Available Key-Value Store

Dynamo is a highly scalable and highly available key-value store that is constructed of many well known techniques.

Amazon.com built dynamo to fill their need for a system with the following characteristics: partitioning, high availability for writes, temporary failure handling, permanent failure recovery, membership management, and failure detection.

The primary contribution of this work was to show how many of the techniques from the literature could be used to create a highly available system. Dynamo glued together the techniques of consistent hashing, object versioning, sloppy quorum, anti-entropy using Merkle trees, decentralized replica synchronization, gossip based failure detection and membership management, and vector clocks.

Also, Dynamo focuses on Service Level Agreements, not just to improve the average response time, but in terms of the response time of the 99.9th percentile.

Dynamo is an "always writable" data store and lets the conflict resolution take place during the reads.

Conflict resolution mechanisms can be selected or developed for each application. The default mechanism is "last write wins"

It is interesting to compare Dynamo with small distributed systems (e.g. 4 nodes) that have less membership variability and a smaller data size. If the data fits on one machine, partitioning may no longer be needed. Sloppy quorums will no longer be needed if there are not partitions. Membership can be handled by a very simple mechanism (a list of ip addresses). Failure detection is also simplified because each node can potentially communicate with all other nodes. It is really the large data size and the incremental scalability requirements of dynamo that make it complicated.

Dynamo is only fit for certain applications. It only stores simple key value pairs and it doesn't provide all of the ACID properties. surely there are applications that wouldn't work on dynamo because they require the values of multiple keys to be updated simultaneously. Although, perhaps it would be okay because Dynamo is "always writable". Hence, all of the keys would most likely get written to the storage, as requests are rarely rejected.

Posted by: Sean Crosby | March 11, 2010 02:00 PM

Problem
Design and Implementation of a tunable Distributed key value store to suit the heterogeneous applications on Amazon’s platform. Design considerations: The focus is on highly availability and fault tolerance. Eventual consistency adopted.

Summary
The system uses consistent hashing among virtual nodes(to support uniform load distribution) to partition the key values. The data is replicated among N nodes in a preference list. Objects are versioned and conflicts are resolved by the user. Causality between different versions captured using vector clocks. Sloppy quorum used to write/read from first W/R nodes where W+R > N. Hinted handoff is used to ensure good distribution in the presence of transient failures.

Permanent failures are handled using Merkle trees. Anti entropy gossip based schemes used to announce addition/removal of nodes in the system. A cache is added to balance performance vs durability. Three strategies are discussed to ensure uniform load distribution in the presence of few popular keys.

Contributions
1. Integration a slew of techniques such as consistent hashing, replication, merkle trees, anti entropy algorithms, sloppy quorum, object versioning in a production environment
2. A system with tunable parameters –R,W,N to adopt to the needs of heterogeneous applications
3. A hands on account of how to balance between conflicting needs – performance and durability, background vs foreground tasks
4. A partition aware client library to route requests to the coordinator directly
5. Has shown techniques that can Scales to Amazon’s environment

Relevance to distributed systems
Having been deployed and run in a challenging and varied application environment such as Amazon, it serves as a model for an eventually consistent data store- The ability to tune parameters is incorporated to suit variety of applications- Techniques to tolerate transient and permanent faults are implemented.-Addresses scalability challenges

Posted by: Ramya Olichandran | March 11, 2010 01:58 PM

Summary:
This paper presents Dynamo, a highly scalable system that provides key/value saving/retrieving service, which is as the backend of Amazon online shopping service. Dynamo achieves over 99.9% availability

Problem Description:
Online shopping is an important part of IT industry. To achieve better user experience(low latency) and make sure service is alive most of the time(high availability) is the key to success. How to implement a distributed system that while maintaining high scalability that achieves both low latency and high availability.

Contribution:
Technically:
(1) Consistent hashing is used to achieve high scalability. In particular, a modified version by using virtual nodes is used to achieve better load balancing. Consistent hashing, in my opinion, is the most important part in their service, as that is the the foundation of the "push/get" service.
(2) A modified version of quorum is used by having backup nodes to make up the insufficient quantity W,R nodes when necessary. This strategy achieves better availability.
(3) They use timeout as failure detection technique.
(4) To achieve low latency, they attempted to use 0-hop DHT with a local node storing enough information about routing.
(5) Data version and timestamp were used to resolve data consistency.

Philosophically:
(1) Simple ideas work better in the real world. When possible, always choose the simple alternative to achieve certain goal.
(2) 99.9% availability may become the de facto industry standard.
(3) Eventual consistency model is sufficient for online shopping system, even a supposedly "transactional"-required service.

Applicability:
(1) It is applied to online shopping, which is arguably the biggest part of e-commerce and supposedly a heavy user of DBMS. The claim that DBMS won't work in their case is quite interesting. As Dynamo's push/get interface looks like Google's map reduce. It would be interesting to hear about how Prof Dewitt would compare Dynamo and DBMS. While there are many papers in DB talking about how to Join, how to achieve stabilizability . Dynamo seems to say for the world largest business management, just use distributed hash table and last write wins.

Posted by: Wei Zhang | March 11, 2010 01:55 PM

Amazon's Dynamo system is presented in this paper; Dynamo focuses on high availability at the cost of consistency. Designing a massive system is a huge challenge and it cannot be done successfully without some sacrifices to ensure that the system is reliable and scalable. In addition, Dynamo used object versioning and application assisted conflict resolution (on reads). This system was designed for Amazon internal use and is used to support many of their services including Amazon S3. Because it is only used internally, the system can make certain assumptions about the data it is dealing with. I think that assuming that the data is not malicious is certainly necessary for maintaining high availability and the quick response time that Amazon requires, but it still leave the system open for problems from a malicious developer or if some bad input somehow got through from the external data.

Amazon created this technology because they realized that the traditional way of storing data, a relational database, was considerably more complicated than many of their applications required for data storage. They realized to access much of the data they had, all that was required was to have a key that could be used to retrieve a value. I don't think this was mentioned, but databases require consistency and they often sacrifice other things such as availability and response time in order to have that consistency. Since availability was of primary concern for some Amazon services, Dynamo seems like a much better option than a traditional database for those situations. Amazon still utilizes traditional databases for parts of their application that require consistency.

I think that many organizations could benefit from a system like Dynamo. Databases are not always the best way to store information, they limit how quickly it can be retrieved, and they require specialized people who know how to administrate them. The system provides high availability without a terrible cost to consistency even in failure situations. I also think it's very interesting that this system truly is a distributed system. They did their best to decentralize and decouple everything which I find really amazing. Theoretically that it how you want a distributed system to be designed, but as we have seen in a number of papers, the system is often forced to rely on some sort of centralized node for some portion of the system. This is generally done to help with speed and consistency. I am impressed that Amazon was able to create a decentralized system that is still efficient.

Posted by: Elizabeth Soechting | March 11, 2010 01:47 PM

Summary and Description:

The paper describes a distributed key-value store which can be configured to provide different degrees of availability, consistency and performance. The system is basically an aggregate of different ideas found in the distributed systems literature, engineered with optimizations to support high configurability in the environment of Amazon's data centers. The system is designed to be run as a part of Amazon's services infrastructure (with the data store in itself being a seperate service), where each service corresponds to a set of service level agreements, and the datastore needs to be optimized for each set of SLAs.

The main contribution of the paper is an evaluation of how various ideas which have been proposed in the literature can be put together to build a real-world distributed system. Specifically, the ideas that have been used are:
1. A variation of consistent hashing with the concept of virtual nodes
2. Replication with eventual consistency
3. Vector clock for data versioning with application specific conflict resolution
4. "Hinted Handoff" and "Sloppy Quorum"
5. Merkle trees for anti-entropy
Each of these ideas have been optimized and varied specifically for Amazon's datastore. In addition, the paper also uses some innovative ideas, such as choosing coordinators for writes based on the latencies of a previous read operation.

Since the system is being in use at Amazon, there can be no doubt about it's real-world applicability, and the paper provides an excellent discussion of the ideas which might be useful to design such systems in the future. However, it seemed to me that the paper could have been better organized so as to provide a list of the ideas being employed and optimizations used.

Posted by: Thanumalayan Sankaranarayana Pillai | March 11, 2010 01:35 PM

Summary:
Amazon’s Dynamo is a distributed key-value store focused on providing high availability, sometimes at the cost of consistency. They use a combination techniques such as consistent hashing, gossip protocols, Merkle trees, and sloppy quorums to achieve this.

Problem:
Amazon’s business relies solely on it’s website and backend systems functioning correctly. Even under the failures of entire data centers, they need to have the user experience appear seamless. Additionally, instead of just focusing on the average use experience, they need to optimize for the users that require the most amount of computing resources (due to large wishlists, histories, etc) because those are there most valuable customers. This makes availability and guaranteed execution time their highest priority.

Contributions:
This paper shows how many of the principles that we have discussed so far in this class (consistent hashing, quorums, gossip protocols, vector clocks, etc) can be put together into a functioning system to achieve certain goals. For example, Dynamo uses consistent hashing to allow for scalability and repartitioning of the data-set when nodes go under. However an interesting twist on the idea is that of virtual nodes on the hashing ring, where one physical node maybe be represented in multiple non-consecutive locations. This allows for load balancing and exploits possible heterogeneity in the system. For high availability, Dynamo uses replication executed by the coordinator node and then uses vector clocks to track versions and the history of each replica as it is updated. Most setups of the system are tuned with writes as a high priority with reconciliation occurring at the read. With the vector clocks, when a read occurs, the system can track the origin of updates to a particular keys and either merge divergent paths or present to the client multiple possible values and leave it up to the client to decide which is most correct. Another aspect of Dynamo that focuses on availability over consistency is the use of sloppy quorums. If not enough nodes are available for a quorum, then the command is still executed on the available nodes but they maintain a list of updates it should pass on to the unavailable nodes. This is called hinted handoff. Merkle trees are used for reconciling the complete data set of two nodes, and by only comparing the hashes of the parents nodes in the tree, the system saves on the amount of data it must transfer between nodes.

Application:
I read this paper mostly in search of ideas we could use to improve the design of our distributed key-value store class project. Some ideas we already had, such as consistent hashing. However our reason for using it was for easy transition of a partition of data and not for scalability such as in Dynamo. New interesting ideas that I may consider for our design include Merkle trees as part of a reconciliation process and the sloppy quorum method, since it allows us to still have our system available even with the small number of nodes. Things that are less helpful for our design are the vector clock versioning that allows for multiple values to be returned to the client, since we are hoping to keep the client as design agnostic as possible. Additionally, the external failure detector is not something useable for our projects.

Posted by: Ryan Johnson | March 11, 2010 01:26 PM

Summary
--------
The paper describes the design and techniques used in the development of an "always-on" key value system maintaining weaker eventual consistency model, tolerating failures and network partitions. The design of Dynamo provides flexibility in balancing the performance, availability, consistency and fault tolerance by providing parameters that the application can configure to suit its own use.

Availability and consistency do not go together. It is interesting to see multitude of colluding techniques used in the overall system. Dynamo uses a sloppy-quorum technique to have an always-writable solution. Object versioning is used and conflict resolution is pushed to the reads. Vector timestamps are used to preserve causality between versions of same object. They also enable system-assisted reconciliation though reconciliation is better handled by the application.

Scalability is achieved by using Virtual nodes based consistent hashing.Fault tolerance is achieved by replication onto N physical nodes. Temporary failures are handled by hinted handoff technique. And anti-entropy mechanism is used in the background to minimize impact on durability due to permanent failures. Gossip based membership and failure detection is used to update global view of each node.

Contributions:-
----------------
1) Virtual Nodes addition to Consistent hashing
2) Use of Merkle trees to reduce Anti entropy overhead
3) Hinted handoff technique.
4) Emphasis on the increasing importance of availability, performace over consistency in todays systems.
5) Emphasis on tunability of the system to suit each application.

Relevance:-
------------
Being a successfully deployed system, the design and ideas in this paper provide a good reference for any one considering building a distributed storage system.

Posted by: Laxman Visampalli | March 11, 2010 12:53 PM

The paper presents Dynamo, Amazon's distributed data storage service. In particular, it takes great pains in highlighting all the state of the art techniques from academia that have been used in its design thus making it an useful specimen study of the real world effectiveness of these techniques.

The problem addressed by this paper is one of practical implementation issues of academic ideas and an attempt is made at sharing the experience of real world effects of various parameters used in the designs.

In summary, the Dynamo system uses consistent hashing as its primary layer of interaction with clients. The design also provides for dum clients who are allowed to request any node for get()s and put()s of a key(and optional object). This request will then be redirected to the one of N possible replicas storing that information. On the other hand, it also leaves open the client to figure out for itself which of the nodes store the required key and contact it directly. Writes are optimized by not requiring all the replicas to reconcile the update during a put(). An asynchronous anti-entropy runs in the background to brace for multiple failures and to ensure eventual-consistency which will also prevent the vector clocks from getting too long and requiring truncation. The membership of the nodes in the service is propagated through a gossip algorithm and each node stores in persistence storage its latest view of the other nodes and their position on the ring. Another very interesting innovation was the usage of Merkle-trees while doing anti-entropy to pin-point the replicas that differ with this node.

The entire design seems surprisingly similar to what we had planned for the first project with consistant hashing backed up by anti-entropy with at least limited support of versioning. But what was interesting to note is that Amazon has chosen to speed up writes rather than reads thus the parameter W being less than R. Another interesting design point was generalizing the number of virtual nodes for each physical node based on its available resources and reliability history and so on. The systems does a great job of providing all the attributes it set out to achieve such as fine grained control of performance vs durability (by tweaking N) and availability vs consistency (by relaxing reconciliation requirements on read). Overall it was an excellent paper giving proof of the viability of the various techniques we learn about in theory.

Posted by: Sanjay Bharadwaj | March 11, 2010 12:50 PM

Dynamo is a replicated storage system for Amazon that emphasizes availability and performance above everything else. The paper describes the grab bag of various distributed concepts that go into meeting their specifications.

The problem they reference is how any amount of downtime costs the company a huge sum of money. Customers need access to their carts all the time to buy things. Its pretty clear to see that access and great performance will lend itself to maximizing the potential for sales. So keep the system up and you earn more money.

The contribution of this paper comes when each idea is shown to be practical in a real world system. No theorizing or guessing, these are the distributed ideas put to work and proven effective. The extensive overlap of ideas also lends itself how one can custom build a system to meet a unique set of requirements.

Application to real world systems... it is a real world system.

Posted by: Jeff Roller | March 11, 2010 12:42 PM

Dynamo is a distributed key-value store that is highly available and scalable. Dynamo is a data store built and used by Amazon for various internal services that it runs. The prime goal is high availability and hence it sacrifices consistency under certain failure scenarios. The key challenges faced in building such a system are node failures, network outages, network partitions, need for high request-response times, and a very high request rate.

The key contribution of Dynamo is the combining together into one huge production system the many common techniques for building distributed systems like replication, data partitioning, consistent hashing, quorum systems, fault tolerance and data consistency. It uses replication for reliability and availability. The services using Dynamo need an always writeable data store and hence the conflict resolution is pushed to the reads, keeping the write logic simple. Some other features that are inherent to Dynamo are the incremental scalability (nodes can be added one by one to the ring), a decentralized architecture (no central coordinator, each of the nodes can act as coordinator for the data it handles), and support for heteregenous components (the virtual node concept allows a single node to act as multiple virtual nodes based on its capacity). It uses a consistent hashing scheme wherein each key is mapped onto a set of nodes decided by "positions" of the keys and nodes on a ring. It also uses data versioning to maintain consistency in the wake of failures. It implements a read / write quorum system for consistency maintenance. However, it allows the services to tune the read and write quorums to suit the level of consistency, performance and availability they need. It is also very flexible in terms of version reconciliation as this logic is pushed to the service end. The paper also describes an interesting way of comparing two replicas namely Merkle trees, which requires only hash values and parts of the tree to be transferred to determine if there is an inconsistency thereby reducing network traffic.

While hashing nodes onto the ring, is it necessary to encode the geographical location into the ring ? Since the nodes are distributed accross data centers, is it possible that a key is replicated on nodes that are "far away" and hence incur the additional network latency cost ?

It clearly is applicable to the real world as it has been successfully been developed and deployed for use by various Amazon services. I wonder how security would be designed for such a system in case the environment it operates in is hostile. It would be interesting to compare Dynamo compare with other distributed key value stores like memcached thats been widely and successfully deployed by Facebook.

Posted by: Deepak Ramamurthi | March 11, 2010 12:35 PM

Summary:
This paper discussed the design choice and consideration of Dynamo, a highly available key-value pair distributed system successfully deployed in Amazon. In order to provide reliability as well as availability, Dynamo adopts an eventual consistency model and leverages several techniques: consistent hashing for load balancing, preference list for data replication, vector clock for causal relation determination, quorum system for consistency, hinted handoff for availability, Merkle tree algorithm for replica synchronization. Besides, this paper has a detailed real application performance analysis in load balancing and other issues.

Problem Description:
The problem this paper tries to solve is to build a production level highly available system for storing key-value pairs (exactly as the title suggests). Their work is important because of its specific needs, that is, a single-purpose key-value pair storage system focusing fully on availability. Most of the existing methods are either too complex and hence costly (such as database systems) or focusing on strict consistency and hence sacrificing availability. Therefore, this paper provides many valuable experiences on the design choice of building such system.

Contributions:
Although most of the algorithms or techniques are not originated by this paper, it discussed many design details and considerations when applying such algorithms to the real application, and I consider them to be the one of the largest contributions. Such design details include: 1) providing an interface for client applications to reconcile conflict versions; 2) Putting a timestamp and a upper bound for the length of a vector clock; 3)Use partition-aware library to directly route to storage node without rerouting; 4) use external discoveries for avoiding logical network partitioning; 5) User-defined persistence storage specification; etc.

Another contribution is the “sloppy quorum” algorithm for the purpose of maximum availability. Traditional quorum protocol enforces W+R>N where N is the number of the designated nodes. This approach will fail if either read response or write response is less than the required number, thus reducing the available time. This paper uses backup nodes (nodes are not on the top of the preference list) to meet the criteria of W and R when some nodes are down, and asks these backup nodes to transfer data back to the original designated nodes, hence increasing the availability.

A third contribution is the detailed discussion of how to ensure load balancing. Even with consistent hashing, reaching balancing is still not a trivial thing. The adding or removing of a node requires the transfer of large amount of data while not influencing other services. This paper tries three different strategies and tries to decouple the data partitioning with data placement.

Real Applications:
Dynamo itself is successfully deployed in Amazon, indicating it a very practicable system. This is a very important example showing that “eventual consistency” system is still applicable in some of the “transactional oriented” environment. Online shopping websites are traditionally viewed as transactional based (for example, adding something to shopping cart is a single transaction); therefore, it seems that strict consistency is required. However, Amazon’s example shows that with careful design of conflict resolving policy (here merging the products in the shopping cart), an eventual consistency system would still survive in such application, and even provide better performance and availability.

Posted by: Shengqi Zhu | March 11, 2010 10:57 AM

Summary:
Dynamo is Amazon's key-value retrieval service. Its main goal is availability, so consistency is relaxed a little bit. The system uses replication as the main mechanism of fault tolerance, and by keeping it decentralized, they are able to avoid complications with single points of failure. The service is used widely among Amazon's products, and there are several optimizations that can be made for different workloads.

Problem:
The traditional way of accessing data from a large service is to use a DBMS which can maintain consistent data more easily. When the system is too large and needs to be distributed, this becomes much harder and cannot handle queries quickly. A simple key-value store can provide the answer to this problem, but maintaining availability and consistency becomes this new problem. This paper suggests techniques to achieve 99.9% availability and suggests where consistency is required and where relaxing it can help.

Contribution:
This paper's main contribution is the case study of a real system using the techniques described. Because it has been used for real workloads, the lessons learned from what has worked well and what can be improved are given. Mixing the use of many techniques, such as consistant hashing, gossip, and quorums yields great benefit in such a system. The object versioning model is also something new that we haven't seen before.

Applicability:
Clearly this paper applies to real systems, since it is the description of a real system in use today. Because Amazon is such a big player in the Web Service game, the assertions they make about their workload are legitimate. Disappointint customers can mean losing money, so measuring the availability of their services is a crucial part of running their business. Applying these methods to other systems could result in similar gains.

Posted by: Jordan Walker | March 11, 2010 09:45 AM

Summary:
The authors describe key problems and solutions to build a highly available, scalable and reliable distributed system 'Dynamo'. Replication is used to achieve reliability. Sloppy quorum and anti-entropy with Merkle trees are used for handling temporary, permanent failures. Incremental scalability is achieved using consistent hashing to partition the keys. In case the system has multiple versions, vector clocks will help in knowing ordering of events and in case of conflicts, the users can choose one of the conflict resolution techniques such as merge or last writer wins. Gossip based failure detection and recovery is used. A library is provided to the client and it keeps the list of active servers up-to-date and the client can access the 'preferred' server for better response time. Different partitioning techniques for better load balancing are also discussed and the efficiency of these techniques is evaluated. Using virtual nodes in consistent hashing to account for heterogeneity and do better load balancing is another key idea. A feedback loop is used to see if background tasks like persisting data or synchronization effect latency and background tasks can be postponed if the latency is beyond desired threshold. Instead of using median and variance for evaluating the system, SLAs are measured at 99.9th percentile of distribution and so guarantees better experience to all customers.

Contributions:

Letting the users know that perfect consistency and availability cannot be achieved and giving users the power to configure R,W and reconciliation mechanism according to applications requirements.

Sloppy quorum with hinted handoff – If a owner for certain set of keys fails, another machine takes responsibility and maintains list of keys to be transferred in case the owner comes back.

Summary of techniques discussed probably gives list of all design problems to consider when designing a distributed system.

Applicability to real systems:
Dynamo's success shows the techniques discussed are very much applicable to real systems. Comparing performance with other real systems would have been interesting. One another minor concern is users should be knowledgeable enough to understand various trade-offs like performance vs. durability or consistency vs. availability and configure parameters correctly for taking advantage of the flexibility provided by the system. Eventually, users will learn but if the initial mis-configurations could lead to bad load balancing or failures, the system should have mechanisms to defend. Also, if the environment is not trustful, unlike Dynamo, then the design has to be tweaked to take care of privacy, security issues.

Posted by: Satish Kotha | March 11, 2010 09:31 AM

Problem Description:
The main problem that is solved by the paper is how to design a highly available, scalabale and reliable storage system with very stringent latency requirements that guarantee SLA so that all the services that are built on top of it have the "always on" experience. The Dynamo systems also focusses on the principle of "always writable"

Summary:
The paper describes the various techniques used by the Dynamo datastore to achieve highly available, scalable and reliable system. The Dynamo should provide a simple key-value store model with ACID properties and should have latency compliant with the SLA. Dynamo also focusses on "always writable" model to avoid missing customer updates. For partitioning, Dynamo uses Consistent hashing as it is highly scalable. To achieve high availability and durability, it uses replication. It uses vector clocks and allows co-existence of multiple versions of same object and does reconciliation during reads for high availability for writes. It uses sloppy quorum for availability during server failures where it selects one of N best votes instead of waiting for all the nodes and uses hinted handoff for tolerating temporary failures.It uses Anti-entropy algorithm to minimize the amount of data transfered while recovering from permanent failures. It also uses gossip-style protocol to enable each node learn about arrival and departure of other nodes.

Contribution:
The paper has various techniques which I found pretty interesting like
1) Using quorum from top N nodes to ensure high availability
2) All the systems focusses on reconciliation during write and a simple read but this focusses on reconciliation during read.
3) Versioning of data using vector clock so that multiple versions of data can co-exist.
4) The Merkle tree based approach to minimize the amount of data transfer

Applicability:
This Dynamo system is highly applicable to the real world as users of Amazon have a very good experience with all the services provided by them. Though this is only a simple key-value storage, many of these concepts can be used in building complex data storage applications too. The various techniques used in the system will be very much helpful in building other systems focussing on availability, reliability and scalability

Posted by: Raja Ram Yadhav Ramakrishnan | March 11, 2010 08:10 AM

After a sequence of eight assigned readings that merely presented ideas, this is the first to actually implement them. Amazon has some of the largest infrastructure in the world, so the techniques they used and lessons they learned are significant. Unsurprisingly, most of the Lamport papers we discussed were not used, and they were not necessary. Amazon is still able to achieve 99.9995% of requests without timing out, a worst case of 99.9% of requests with a 300 ms response time, and incurred no data loss to date (of the paper). Using heavy-weight protocols like Paxon agreement or Byzantine fault detection would have seriously hurt performance.

Some of what they used we have talked about in class. Lamport's least theoretical paper on time and ordering was used. Of the remaining papers, consistent hashing and epidemic propagation. They also used quorum agreement, which we at least talked about.

Dynamo runs on clusters of machines using consistent hashing where all nodes communicate directly. Distributed hash tables could have been used, but Amazon is able to simplify the communication and decrease the latency. DHT is of no benefit when the nodes are all in close proximity to one another; they can talk to the source immediately. The authors say that DHT is able to better solve the scalability problem up to tens of thousands of nodes, but this is of little benefit when nodes can simplify be divided into clusters based on their use.

They do two things with consistent hashing that are particularly interesting (the second described in the next paragraph). One is that each node is assigned many short ranges on the hash circle, achieving much more even load distribution. If nodes are to pick a random location on the circle, there can be a high degree of variance between the assigned ranges. This is an important decision when there are few nodes, or when some nodes desire more responsibility. If a node wants twice as much responsibility in the single-range approach, it must get lucky or know the layout ahead of time. With the many-range approach, it can just allocate twice as many ranges at random and probably get what it wants. It's also beneficial to their replication scheme, where data is replicated at the next k hosts. It's far less likely for a node A to be replicating a big mess of data on another node B; then when the A goes down, B would have to replicate all of A's data onto C.

They also give three ways of distributing these short ranges about the hash circle. One selects locations at random and is easy to implement. Another is random with a far more even distribution in responsibility, is still easy to implement, but is static. The final is similar to the second approach except that it's dynamic; as nodes join, they steal partitions of the circle from the others. The equal-sized-partitions methods allow another small benefit not mentioned: nodes can place their ranges so as to not allocate contiguous regions to the same node.

Posted by: Markus Peloquin | March 11, 2010 07:03 AM

Summary
The authors developed a highly competitive distributed system using a variety of the techniques discussed in literature. The system, Dynamo, is somewhat configurable to balance the needs of users (read/write speeds, consistency guarantees, etc.).

Problem
The authors aimed to develop a key-value service that would be used by a number of Amazon’s core services. The service was to be a distributed system to provide nearly always-on service, performance guarantees, and consistency guarantees despite hardware/software failures.

Contributions
One of the core contributions is to allow the system to be configurable. In particular, parameters R and W specify the minimum number of nodes participating in a successful read or write operation, respectively. Users of Dynamo (e.g. other Amazon services) can choose values for R and W depending on the consistency, latency, etc. needs of the particular application instead of being forced into one model. Another major contribution is discussing the experiences with implementing a large number of distributed techniques into one system. Many of the techniques complement one another such that often extra optimizations can be taken to improve one aspect (e.g. performance) with little penalty to another (e.g. durability).

Applicability
I think this is one of the most applicable papers we’ve read. The authors take many of the theoretical methods to provide consistency, availability, performance, etc. and discuss the experiences with building a high end distributed system (a key-value store in this case). Overall, I think this was an enlightening paper to see all the techniques come together in a highly successful distributed system.

Posted by: Jesse Benson | March 11, 2010 07:02 AM

Goal:
Dynamo is a storage backend for various Amazon services. In order to achieve high availability and throughput, consistency is relaxed and simple database approach is adopted. Additionally, wide ranges of protocols are used in conjunction to solve various problem of the entire system.

Problem:
High-availability and consistency come with a performance penalty. The hard part of using consensus protocol in production system is to select protocol that make a right tradeoff suitable for application requirements.

Contribution:
Dynamo is backend storage that provides high availability for writes. This means that write will never be reject at the cost of inconsistent read. Thus, it allows application to resolve conflict because the application understand the semantic and might be able to merge it is a sensible way.

Dynamo utilizes various mechanisms to provide fault-tolerance at each part of the system. The core mechanism is combining consistent hashing with quorum consensus. Global hash is divided in equal partition and give to nodes in the preference list. This provides incremental scalability and less disruption when node membership changes. Additionally, quorum consensus provides flexibility in terms of degree of consistency and performance; therefore, it can support wide range of application requirements.

Since it core mechanic does not provide consistent guarantee in case of failure, it addresses this problem by using two more protocols. Temporal failure is handled by hinted handoff which asks one of the available nodes to store missing data for failed node. Therefore, the message exchange to repair the system is kept minimum but prone to failure. As a result, anti-entropy by using a hash tree is used to fully compare database different and restore consistency in case of permanent failure.

Membership information is distributed using gossip-style protocol with no central registry about node membership. Failure detection is done implicitly by observing the response from other node. This is possible because the request rate is quite constant in the system.

Dynamo offers another design philosophy where various protocols are chaining together to offer consistency. In normal operation, the system can provide consistency without much overhead. Thus, the system only utilizes higher cost protocols when failure occurred. As a result, the overall performance of the system is close to quorum consensus but with the strong consistent guarantee of anti-entropy.

Application:
Dynamo design philosophy is on the opposite spectrum of Google’s Chubby. Instead of using high fault-tolerance protocol with high overhead, it use a collection of protocols which have low overhead on normal execution path. This kind of design also offers incremental improvement and less complexity because each protocol can be replaced or adjust without affecting the correctness of the entire system. Therefore, any distributed system can adopt this design philosophy when tackling with fault-tolerant issues.

Posted by: Thawan Kooburat | March 11, 2010 07:02 AM

This paper explains the design of distributed key store system, Dynamo. The various trade-offs in the system that are made to achieve high availability are explained. It also present the techniques been employed to solve various distributed systems problem and their advantages.

Amazon services predominantly operate of key-value store, and this paper tries to build a scalable and reliable distributed system infrastructure to cater to those web services. These systems operate on weaker consistency models and require high availability specifically for updates. The hardware infrastructure is assumed to be built on the top of commodity systems.

The important contribution of this paper is the partitioning algorithm, a variant of the consistent hashing. The system uses virtual nodes to partition loads so as to have uniform load distribution in case of outages. Also replication is coordinated by means of preference list, which is list of nodes responsible for particular key. Dynamo uses synchronous replication among these nodes in the list for a update. Dynamo system design provides eventual consistency and updates are treated as immutable objects that are versioned. The versions are maintained using vector clocks and the client is presented with all the available versions in the store. Hinted handoff is used to handle failures, where a request to a node during failure is redirected to replica. Anti-entropy mechanism is also employed in order to provide asynchronous synchronization for nodes that come back after long outages. A Merkle tree data structure is used for comparison of the inconsistencies in the key values of two replicas.

Dynamo system ensures availability at the cost of consistency. It may also be well possible that the data is under-replicated during the nodes failure. Although this seems to suit the shopping cart paradigm explained in this paper, yet there must be a certain level of guarantee about the number of active nodes in system. The recovery protocol of the system may as well become more complex when the nodes return after failures into such a under-populated system. Also the existing nodes of the system might suffer a sporadic increase in load to service the failed replicas. Hence it is not clear the papers intention of allowing the level of availability from W to l based on applications requirement. It rather needs to be a inherent property of the system.

Given that system has a preference list of N Replicas for a given key, it is not clear why the design does not leverage synchronization from multiple replicas. When a node comes back it could synchronize by pulling data simultaneously from N-1 other replicas that belong to the same preference list set of Node. This would enable the read request during replica synchronization to be distributed among the nodes in the system, rather than fetching from a single node.

Posted by: Rajiv | March 11, 2010 07:00 AM

Problem
The problem is to provide key-value store that is highly available, highly scalable and best-effort consistency. For ultimate consistency, there were Paxos implementations and RDBMS that replicates to multiple nodes. However, the author observed that Amazon does not need ultimate consistency but sweet spot between availability, scalability and consistency. By loosening consistency degree, the problem turns out to be challenging and novel. Single image RDBMS that consists of thousands of node was impossible or at least useless. Here, the author describes key-value store that extends to thousands of nodes.

Summary
The paper is ensemble of many ideas. They architected problem into partitioning, HA for writes, handling temporary failures, recovering from permanent failures, membership and failure detection. Among these, the key idea is to use sloppy quorum. Sloppy quorum is that data are partitioned into some of nodes and they replicate same data. Unlike Paxos, it requires only part of nodes to write a value onto storage. It can enable tunable consistency that defined by R, W, N. With a mix of consistent hashing, it scales out to thousands of nodes. The key paradigm in Dynamo is that conflict is resolved by the client. The store can remain conflict state and the client has to deal with conflicts. Their experiment shows that it rarely happens when there are multiple concurrent writes. Also, it was not hard to resolve. Other than that, the paper explains various optimizations such as read repair.

Contribution
Their experience on Amazon which is one of the biggest web services in the world is unique and rarely found. Their engineering goal was to develop a toolset for Amazon but it can be applied to many of large scale web services because it can be considered as fundamental utility for large scale web service.
They showed contemplated mixture can be very useful. They mixed almost every algorithm in the field of distributed system. It does not look like dirty spaghetti but it looks like perfect buffet. Mix of sloppy quorum and consistency hashing was particularly insightful.
They, again, showed the difficulties of real world app like Google’s Paxos showed. They had to deal with unusual, unexpected failures. Also, they had to deal with boundary cases such as limiting the size of divergent versions. For theory, it might have much significance. But, for real world app, all of these boundary cases should be dealt.

Applicability
Dynamo is one implementation of DHT. Any data that does not require strict schema can use Dynamo. For example, most of session information maintained in web server can be stored into Dynamo. Shopping cart in Amazon is one example of session information.
Much software uses RDBMS because there was no choice when some degree of availability and consistency is required even they do not require transactional processing and ultimate consistency. Dynamo and column database such as Bigtable can be nice alternative to RDBMS in these cases.

Posted by: MinJae Hwang | March 11, 2010 06:48 AM

Summary:
This paper introduces Amazon’s experience to build a high available, reliable and scalable distributed system, Dynamo. Though Dynamo just provides a simple user interface with “get” and “put” operation, it enables users to customize the storage system to different performance requirement by tuning N, R and W parameters.

Problem Description:
While traditional design emphasizes all the ACID properties, the designers of Dynamo pay more attention to enhance the system availability, reliability and scalability, which may cause a weaker consistency for the system. They believe such principle works well for Amazon’s applications.

Contributions:
Since Amazon’s final goal is to provide their customers a good experience, the principal goal of Dynamo is to enhance the system performance (here mainly means the response time) and maintain high availability. To attain such goal, they adopt several important mechanisms:
1. Simplify the user interface, just provide get(), put() two operations.
2. Adopt a variant of consistent hashing and use the virtual nodes. Such separation of responsibility and execution makes it much easier to handle the node failure and soon come back problem.
3. They just guarantee the eventual consistency of the whole system, and let the applications to handle such inconsistencies. This idea works well for Amazon’s applications.
4. Use hinted handoff to guarantee the system availability.
Apart from this, Dynamo also implements some mechanisms from other distributed systems and database, like the Merkle trees for replica synchronization, their failure detection method, etc.

Applicable to real system
I think it’s a very nice paper to provide precious experience for design and implementation of practical storage system with high performance and availability. Its most impressive point is it successfully grasps Amazon application’s principal requirement and low other comparatively unimportant demands, like the consistency and load-balancing. Meanwhile, the designers’ endeavor to keep the system as simple as possible is also a nice idea. I like this paper very much.

Posted by: Ao Ma | March 11, 2010 06:41 AM

Summary:
This paper describes Amazon’s high performance Key-Value storage architecture. This architecture ensures not only the high availability, but also high data consistency.

Problem:
- Difficult to scale: the evolvement of the business requires the transparency during the scaling of storage system. When new server node is added into the system, they data must be re-partitioned across the server nodes, which will inevitably involve a large amount of data movement. Such data movement should not impact the service for the clients.
- Reliability: a production system should provide high reliable service. From the clients perspective, the system should run 24*7*365
- Single point failure: a production system should avoid any single point failure, which means any failure should not impact the whole system

Contribution:
- This paper emphasizes the importance of availability, fault-tolerance and high performance in the design of a production storage infrastructure. Basically, there is a tradeoff between of these three. Fault-tolerance requires extra work(replicas), which lowers down the performance. On the other hand, extra work will lead to the complexity of system which impacts the availability. Dynamo uses (N, R, W) to represent the relation between these metrics and through the configuration of (N, R, W), the upper layer application can easily achieve a good balance.
- Dynamo alters consistent hashing to distribute the data across server nodes. It partitions the data in fine granularity across a circular chain of virtual nodes, each physical node(a physical server) can contain one or several virtual nodes, which depends on the performance of that physical server. Since all the node are arranged in a circular chain, for a change of adding node or removing node, the amount of data that need to be moved equals to the data that resides on the objective node. In this way, only a small part of data need to be re-hashed.
- Dynamo uses vector clock to address the problem of multiple version in data consistency. It maintains the version history info and counter for each node and determines whether these version is in parallel branch or in serial branch, so as to determine whether need to address the confliction. There are two solutions for confliction: on client-side and on server-side.

Posted by: Deng Liu | March 11, 2010 06:38 AM

This paper presents about the highly available key value store that is designed and being used by many of Amazon's services named Dynamo.

Relational database could have solved this problem but it does not scale well nor satisfy the stringent constraints of Dynamo. The main target requirement for Dynamo is its very high availablity thereby providing always writeable data store where in all updates are accepted even in the presence of nodes failure. The system makes trade-off with consistency to ensure availablity. Its design decisions are based on its requirements like conflict resolution during reads rather than during write operations (To make it always writeable store), application level conflict resolution to avoid inconsistency in data. The focus on 99.9th percentile of distribution has been a real motivation to build a system like this.

Dynamo is designed by putting together different research ideas. (a) To increase availability and performance through partioning and replication by using consistency hashing (b) Data versioning and the use of vector clocks is used as conflict resolution technique. (c) Handles temporary and permanent failures efficiently. It avoids wasteing much time in synchronizing the data with other nodes. It employs sloppy quorum and hinted handoff and anti-entropy using merkle trees to achieve this. (d) Membership and failure detection through gossip based mechanism. The collection of membership info of others would act as zero-hop DHT. The methods have been synthesized properly as per the requirement and tuned as needed.

Queries: (1) How do they test these type of distributed systems which involves interaction with different modules and also is debugging done mainly through collected logs? (2) They dont consider the worst possible cases of failures and present the system behaviour. Is it because the worst cases are not possible in real systems or behaviour of the system controlled by the protocols they employ ? (Because to avoid the long vector clocks, they truncate the oldest pair from the clock. Reason being such scenario not faced in production system. But theoritically, this mean writes coule be lost)

Posted by: Sankaralingam Panneerselvam | March 11, 2010 06:35 AM

Problem addressed:
This paper describes the methodology and engineering to build a highly available large key-value storage system. They show how many different ideas of large scale fault tolerant system can be put together to build a practical system of large size.

Summary:
This paper describes the design and engineering of Amazon's highly available, eventually consistent key-value store called Dynamo. Few of the interesting aspects of the Dynamo are as follows:
1. The Dynamo is targeted for write availability rather than read availability. So even in presence of network partitioning and server faults, Dynamo strives to be able to accept updates by the client. This forces the designers to push the complexity of conflict resolution to the reader so that no write is rejected.
2. Dynamo provides eventual consistency and thus does not guarantee strong consistency of the data returned by the client operations. It uses Vector clock based timestamping of the data objects( data versioning). If it finds multiple replicas of the same object, then it tries to resolve the conflict by causality information based on vector timestamp. If that does not succeed, the client needs to resolve the conflict semantically.
3. It uses consistent hashing based partitioning of data based on the key to distribute the database across the nodes.
4. Depending upon the Service Level Agreement requirements, Dynamo allows applications to tune the number of replicas(N), number of replicas required to complete write(W) and read(R) operations; delivering desired levels of performance, durability and consistency.

Short summary:
This paper describes how different methodologies of building fault tolerance distributed systems can be applied together build practical large highly available key-value store.

Relevance:
This paper has great relevance in delineating how practical highly available key-value store can be implemented and what are the different design considerations that might be required to decide upon. This paper could also act as basis based upon which future large fault tolerant system can be built upon. This paper also shows how SLA and particular application requirement can influence the various design choices of the system.

Posted by: Arkaprava Basu | March 11, 2010 06:34 AM

Problem :-
To create a highly available, reliable and scalable Key-Value store to support Amazon's online services and fulfill the desired performance, durability and consistency SLAs ( Service Level Agreements ) in its fully decentralised service oriented infrastructure.

Summary :-
The paper describes Dynamo, Amazon's distributed key-value store whose aim is to provide a highly available (and scalable) service while providing eventual data consistency in the presence of temporary failures, arrival/departure of nodes and network partitions. Amazon's requirements have influenced Dynamo's designers to design a system that doesn't need to follow strict ACID requirements while creating a conflict resolution mechanism that ensures that writes are never rejected. To achieve these goals, they use a combination of techniques such as Consistent Hashing along with preference lists, Vector Clocks, "Sloppy Quorum", anti-entropy for replica synchronization and gossip based membership protocols.

Contribution :-
The main contribution of the paper is to describe an truly distributed key-value store that has very stringent availability requirements and trades off consistency for availability. The system provides flexible replication options (in terms of numbers of replicas) and uses tunable read, write and replication parameters for quorum purposes to maintain consistency. Another contribution is the hinted hand-off mechanism to handle failures and the "sloppy quorum" mechanism. It uses Merkel trees for efficient replica synchronization and reducing the number of disk reads during the anti-entropy process. Dynamo's use of an explicit mechanism for adding/removing nodes helps distinguish temporary failures from these scenarios. It improves the original consistent hashing idea by introducing the concept of virtual nodes on the ring to account for the heterogeneity of the physical nodes.

Applicability to Real Systems :-
Dynamo has been successfully deployed and supports Amazon's core online services. It is a very good example of an incrementally scalable distributed system that provides high availability and durability. The paper discusses a number of interesting techniques used in its design that can be used by system designers while building similar systems.

Posted by: Ashish Patro | March 11, 2010 06:26 AM

Summary:
This paper presents the design and implementation of Dynamo, a highly reliable key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.

Problem Description:
Amazon provides e-commerce services to millions of users all over the world. The demands for the amazon’s platform are high availability and high scalability.
Dynamo stores simple key-value data pairs and supports simple reads and writes to data. Partition and replication of data uses consistent hashing. To achieve high availability in a failure-prone system, they use a “always-writable” optimistic replication and therefore sacrifices some extent of consistency. This paper shows how to use techniques to solve problems in such system, e.g., resolve conflicts, failure detection, efficiency, etc.

Contributions:
The paper’s main contribution is the evaluation of how different techniques can be combined to provide a highly available system. It shows how to use, and how to tune these techniques to meet the requirement of the production system. Some of the interesting techniques include:
(1)SLA, service level agreements. At amazon, SLA are expressed and measured at the 99.9th percentile of the distribution, to reflect and guarantee excellent user experience.
(2)Virtual nodes to improve load balance.
(3)Data versioning with vector clocks to resolve read conflicts and the quorum like protocol with R, W, N parameters to achieve eventual consistency.
(4)A gossip-based protocol for failure detection.
And so on.

Applications:
This paper talked about practical issues of building a high available and scalable key-value system to meet restrict demand of the services. The decisions and techniques discussed are all very useful hints for people who aim at building similar types of systems.

Posted by: chong | March 11, 2010 06:19 AM

Summary:
This paper describes Dynamo, a distributed key-value store built by Amazon to be highly available and scalable. The key points of how Dynamo handles consistency, replication, membership and other distributed system issues are discussed.

Problem:
How do you build a distributed key-value store that is scalable and conforms to strict performance requirements? How do you provide availability even in the face of network partitions? How do you design the system to present different levels of availability and consistency to different applications?

Contributions:
Dynamo presents a way to design a single system which uses the N, W, R parameters of classical quorum literature to provide "knobs" using which different applications can obtain different levels of consistency and availability from the system.

Dynamo uses consistent hashing to provide incremental scalability. The real contributions of this paper come from the number of optimizations they have done, and strategies they have evaluated for issues such as replica propagation. Merkle trees to optimize anti-entropy measures.

Comments:
I thought the business logic reconcillation was a bit shady - They conveniently push tough-to-merge replicas to the business logic to decide. Many distributed systems do not have this luxury and are stuck with either time-stamp based merging, and something like last-write-wins. When I initially read that they tolerate network partitions, I was very eager to see how they would handle concurrent updates ( not least because I wanted to use the same idea in the project ), but the business logic merging left me a little disappointed.

Application to real systems:
Dynamo serves much of Amazon's Web Services, so it is a very practical implementation of distributed system concepts. Many novel concepts have been merged together to build the system.

Questions:
1. The blog post mentions that very few production systems actually make it to SOSP. Why is this the case? Given the inherent proof of correctness in these systems, I thought they would be welcomed even more at conferences.

2. How long would/did building a system like Dynamo take?

3. How does this compare with memcached and Chubby?

Posted by: Vijay Chidambaram Velayudhan Pillai | March 11, 2010 05:51 AM

Summary:

This paper describes how Amazon implements highly-available key-value store service. Amazon's solution enables control over the tradeoffs between availability, consistency, and performance. Dynamo uses several well-known techniques, including those we covered in class, such as consistent hashing and vector clocks to achieve their goals. So, a large proportion of the paper covers the task of synthesizing pre-existing solutions to various sub-problems of key-value store.

Problem:
The end goal for the Amazon developers was to not only provide a highly-available key-value store service, but also allow configuration of tradeoffs between consistency, availability, and performance. Amazon engineers observed that different applications have different requirements, so they should be able to configure the system to best suit their needs. By taking a divide and conquer approach to the problem, the engineers were able to exploit pre-existing solutions to particular sub-problems to simplify their task. As the Paxos paper already points out, the main problem with using solutions proposed in the academia is that they are highly idealized and need significant modifications to be employed in a real-world situation.

Contribution:
The main contribution of this particular paper is an example of how well-known techniques can be synthesized together to solve a bigger problem. In the same vein, the decisions described in this paper make it clear that some demands of the system have to be relaxed. For example, the Dynamo designers choose to relax the consistency requirements, only then can they obtain the goal of high-availability. Among the more novel contributions is the notion of sloppy quarum and the associated concept of hinted handoff to achieve even higher availability.

Comments:
I was not entirely clear on their argument that the stores are more important than the loads. The example covered the shopping cart experience. What is confusing is that while putting something in the shopping cart is important, being able to then observe the shopping cart strikes me as even more important.

Posted by: Polina | March 11, 2010 05:21 AM

This paper describes the designs of Dynamo system, which is an eventually consistent storage system with high availability, good scalability and relatively cheap cost. Dynamo solves scalability issues with relational databases and meanwhile guarantees weak consistency and very high availability to satisfy applications which don’t require strong consistency but to deliver their functionalities in abounded time.

Traditional RDBMS is too expensive. Dynamo group aims to build a distributed storage system which is scalable, simple, and highly available and guarantees service level agreements. According to the CAP theorem, we can have at most two of the three properties consistency, availability, tolerance to network partitions for any shared-data system. Dynamo choose to sacrifice strong consistency but it still need to keep weak consistency, handle conflict, increase scalability, consider symmetry, decentralization and heterogeneity.

In Dynamo, virtual nodes are mapped onto physical nodes which are organized into a ring, so that hardware can be swapped and tolerate failure. The partitioning algorithm specifies which nodes will store a given object. Dynamo use consistent hashing technique to avoid the wide range rehashing caused by the change of hash bucket in normal hash partition. Every object has several replications in different nodes. The updates to the system are lazy but not eager, so this may result in slight differences in multiple copies of the same object. Dynamo uses versioning and read-repair to tolerate the possibility of inconsistency and resolve inconsistencies at read time. So in dynamo, objects are always writable. All nodes have the same responsibilities. Any node in the system can be issued a put or get request for any key. This kind of decentralized control design is very good for scalability.

Dynamo is a well-designed distributed storage system. It targets applications which require week consistency but high availability. This paper presents the problems and solutions very clearly and gives many good guidelines for key-value storage system designers.

Posted by: Lizhu | March 11, 2010 05:21 AM

This paper gives a detailed description of Dynamo - Amazon's internal distributed key-value store system. This system is used internall by the other services. Some of the key for this system are high availability, loose consistency, always available for writes, plain key based object retrieval, strict performance restrictions,etc. This paper descibes several standard distributed algorithms and some optimizations used to realize such a system.

Problem Statement:
Building a distributed system with high availability, strict performance requirements(99.99th percentile) and good consistency is a challenging task.Implementing susch a system requires the use of a variety of techniques as decribed in this paper.

Contributions:
1. Consistent Hashing for data partitioning: Using the concept of virtual nodes to distribute the additional load evenly in case of node additions/failures and for heterogenous nodes.
2. Replication : The key-valus pairs are replicated across different data centers to provide reliability in case of many different types of failues - node failures,network or natural disasters. At the same time, the policy for selecting the nodes in which to replicate the data is simple - a specified number of succesor nodes following the co-ordiator node.
3. failure Detection : A simple and an elegant mechanism for failure detection - the absence of response from he node. These can be local too.
4.Quorum based consistency model : Configurable R,W,N parameters for each application is a unique implementation.
5. Object versioning : Providing both a simple 'last write wins' conflict resolution scheme as well as allowing the applications to construtc more sophisticated views by combining several versions of the same object. Using vector cloks to resolve conflicts where possible.

Applicability:
This paper shows how a combination of methods can be used to build a robust distributed system.It also demonstatres a number of optimizations that can be applied to the standard algorithms to obtain better performance in a certain area by trade-offs in another.

Posted by: Neel Kamal M | March 11, 2010 05:14 AM

DYNAMO
- Summary
Design and implementation of Dynamo, a highly reliable and available key-value store, with a demonstration of how various techniques can be applied to obtain scalability and availability in the face of failures.

- Problem
Building a key-value store system on top of multiple servers with strict requirements in terms of performance, reliability and scaleability. Some major challenges for such systems: how to provide availability when facing with failures (e.g crash, network failures...), how to provide consistency, how to efficiently do replication and propagate update, how to detect failures, etc.

- Contribution
The main contribution of this paper is that it shows how various techniques from research community can be applied to build a real distributed system with high availability. Dynamo design choices determine techniques it uses for implementation. It guarantees a model that eventual consistent and "always writable". Some techniques used in Dynamo:
+ consistent hashing is used for partitioning in order to provide incremental scalability, with the concept of "virtual nodes" to solve the problem of non-uniform data and load distribution.
+ data versioning is used to provide "always writeable". Vector clocks is used to capture causality between different versions of the same object.
+ consistency among replicas is guaranteed by using quorum-like techniques (R+W > N)
+ replica reconciliation is provided by using Merkle Tree, which has several advantages like small transferring amount of data, minimum number of disk reads during reconciliation.
+ Dynamo uses a gossip-based protocol for exchanging membership information, as well as propagating partitioning and placement information, which allows each node to forward a key/read write operation directly correct locations.

- Comments/Flaws/Question:
I think there is no flaw in this paper since every design decision is naturally flowed from what the developers expects the system to be, and from workload that Dynamo is about to serve.

Posted by: Thanh Do | March 11, 2010 04:54 AM

This paper discusses Amazon's Dynamo system, a distributed key-value store. The system is used for some pieces of Amazon's services, and thus requires high availability. To obtain this, the system makes use of a number of techniques that we have discussed in class: consistent hashing, gossipy protocols, vector clocks, anti-entropy, and quorums.

The main problem that the system attempts to address is that in order for the service to be useful, customers need both consistency and high availability; it also needs to be large, so scalability is important. Dynamo uses consistent hashing to decide which nodes store each write. Each node is considered to be a number of virtual nodes, which allows newer systems to mix with older ones and still have the load balanced well. Data is stored at the N nodes past the point the key is hashed to (duplicates and down nodes not included), and at least a certain R and W nodes are needed for a successful read or write. The values dynamo uses are N=3, and R=W=2. Dynamo does versioning with vector clocks, and resolves conflicts on on read so that writes are fast. Conflicts are expected to be uncommon, and the experimental results confirmed this.

One contribution of the paper is in their method for measuring performance; rather than using mean or median performance, they look at the latency of the 99.9th percentile. This is because sometimes the longer latency might be correlated with things that make people important customers (such as long search histories). The paper also contributes some practical results about how techniques can be applied in a production system. It also discusses when different types of version conflict resolution might be appropriate; for shopping carts, conflicting updates should be merged, but in some cases last write wins is okay; also, it is sometimes possible for the key-store to resolve conflicts, but sometimes it needs to ask the application for help.

Amazon has found ways to use a key-value store for some of the services they need, rather than a more complicated data structure (like a database). The ideas in the paper are pretty clearly applicable to others trying to develop systems of the same type.

Posted by: Lena Olson | March 11, 2010 03:38 AM

The paper describes Amazon’s Dynamo as a distributed storage system with scalability and simplified key value at it’s core. The main motivations behind it are the guaranteed SLAs and high availability.
Dynamo is developed to address the requirements of some of Amazon core services which need an “always on” experience. It is used as Amazon’s implicit service which acts as power part to other Amazon web services.
The system starts with some basic assumptions. Though consistency is put to compromise with availability but side by side we get “always writable” facility as conflicts are resolved during reads instead of write. The designers sufficiently focused on incremental scalability, symmetry, decentralization and heterogeneity. The partitioning is addressed by consistent hashing. Vector clocks with reconciliation during reads facilitate high availability during write operations. Dynamo implements sloppy quorum and hinted handoff which alongwith providing durability, also guarantees availability when some of replicas are unavailable and hence handles temporary failures. In case of recovery from permanent failures Merkle trees based anti entropy mechanism synchronizes various replicas in the system. In addition, Dynamo implements gossip based membership protocol to preserve symmetry as well as storing membership in decentralised registry with information about liveness of nodes as it may help in failure detection.
The eventual consistency provided by Dynamo is somewhat for implementation simplicity. For high availability, the responsibility of resolving version conflict is delegated to application itself by Dynamo is impressive. Dynamo excludes the corner cases on the philosophy that worst case complexity usually comes from a few limited cases and hence it guarantees services to 99.9% of assignments.
Dynamo makes design too complicated and complexity is increased as service reliability and issue of high availability are resolved in single layer. Over the time, size of vector clocks can increase considerably. Like other industry publications there is not much encouragement from the point of novelty. There needs to be further more clarification and elaboration on zero hop DHT by DeCandia et al.
According to developers Dynamo is not a straight forward web service for the outside world but its persuasiveness is essential for preponderance of other web services by Amazon. Dynamo is a model for combining existing techniques and the team has made a sound combination of recent as well as well established technologies in operating system and distributed system research like consistent hashing, versioning, quorum, vector clocks etc.

Posted by: y,Kv | March 11, 2010 03:28 AM

Summary:
This paper provides a detailed description of Amazon's internal key-value store system, Dynamo. Dynamo is built to allow flexible trade-offs between performance, availability, and durability. The paper gives an overview of the service expectations for Amazon, and the various design considerations. The paper next provides details of each aspect of the system architecture, pulling in mechanisms and ideas from various previous papers. Finally, the paper briefly describes the implementation, experiences, and gives conclusions.

Problem description:
The challenge of building a highly available key-value store that is both performant, highly available, and has good consistency is an absolutely non-trivial problem. Many other works analyze specific aspects of the problem and provide individual solutions. In this paper, the authors describe a complete solution that tackles all aspects effectively, and is proven in the real world. The authors report that the system has been in use for two years with great success. The authors also provide numbers on data consistency and 99.9th percentile latencies that are very impressive.

Contributions:
The paper has several contributions. First, the paper pools together a variety of previously proposed technique into a cohesive whole that performs impressively in many categories. Second, the paper describes a set of knobs (N, R, and W) for configuring the performance, availability, and durability to suit the needs of the user application. Many datacenter-level services require different levels of each of these three things, and Dynamo can service all of them equally well. Finally, the paper proposes a method for conflict resolution that is unique, where conflicts are resolved on reads rather than writes. This method works very well for Amazon's services.

Applicability:
The applicability of this paper is obvious. The paper itself reports on a system that is in active use and serves millions of users daily. The impression the authors give is that Dynamo is highly successful and meets essentially all the requirements and expectations it was intended to meet. Any other company or research group hoping to develop a similar piece of infrastructure would do well to read this paper thoroughly, apply its insights, and learn from its successes.

Posted by: Marc de Kruijf | March 11, 2010 02:40 AM

Summary:
This paper describes the design of Amazon's highly available, distributed key-value store, Dynamo. Dynamo presents an extremely simple interface to users, providing only get and put commands. Despite this simple interface, Dynamo allows a high degree of customization, allowing application level conflict handling, and tuning of R, W and N parameters per application.

Problem:
Many of Amazon's applications do not require the full complexity of an ACID database. They can do without transactions, structured data and perfect consistency. They do require extremely high scalability and reliability, durability is also desired. Amazon developed Dynamo to meet these design goals

Contributions:
In many respects, this paper resembles Google's Chubby and their Paxos Made Live paper. Both implement distributed systems with no single point of failure to meet specific business needs in the company. Dynamo is less formal about algorithmic design and correctness, but they explicitly describe the SLA (Service Level Agreement) they designed Dynamo to meet. Chubby's main SLA is consistency at the cost of performance.

Dynamo's SLA specifies very aggressive latency, throughput and durability requirements. These requirements reflect the user-facing nature of Dynamo, it is not used for a small amount of slowly changing administrative data (Chubby's use case), but rather for all the personal data behind a user's web experience. In Amazon's case, slow performance directly influences sales, so Dynamo's design is highly practical.

To meet these goals, Dynamo prioritizes writes at the cost of more complex read logic (to handle conflicts). This makes Dynamo a system implementing Optimistic Replication in the form of Eventual Consistency.

The actual algorithms used in Dynamo are collected from many areas in distributed systems and databases. I believe the most novel aspects include business logic specific reconciliation, 99.9 percentile measurements(as opposed to median or mean) for SLA, and use of virtual nodes to load balance among heterogeneous hardware.

Practical Applications:
Dynamo is a highly deployed, practical system. Most of us have used applications built using the Dynamo key-value store. Beyond this, the paper shares many engineering decisions that may be useful to others implementing similar systems.

Posted by: Joel Scherpelz | March 11, 2010 12:00 AM

CS 739 Review Blog

Dynamo: Amazon's Highly Available Key-Value Store

Comments

Post a comment