CS 739 - Reviews - Spring 2011: |Dynamo: Amazon's Highly Available Key-Value Store

Summary
In Dynamo, Amazon implements a highly available key-value store using distributed mechanisms. The service that they are looking for is simple read/write operations on objects that can be uniquely identified by a key. In their implementation, they have to consider Amazon's specific SLAs, production goals, and application requirements.

Specifically, they make use of consistent hashing for partitioning the key space into servers. They use Vector clocks with reconciliation during reads in order to have highly available writes. They also use sloppy quorum and hinted hand-offs to handle temporary failures. Anti-entropy with Merkle hash trees is used to handle permanent failures. They use a gossip-based membership protocol and failure detection for managing their DHT.

Problem Statement
The problem that this paper tries to solve is to create a highly-available, scalable key-value store that meets Service Level Agreements and gives flexibility to the application developers to tune this system for their application's needs.

Contributions

This paper combines different mechanisms developed by the research community (such as consistent hashing, Merkle hash trees, anti-entropy,...) to make a highly-available and scalable system possible.
The paper demonstrates how building a distributed system is like an art. There are a lot of design choices that you have to make, and every system is unique based on different requirements and services that we expect from the system.
The idea of decoupling partitioning and placement in consistent hashing is important. This is because consistent hashing actually is not very successful in load balancing when the load is not high and targets only a few hot spots.
Giving individual applications flexibility to tune the parameters based on their requirements is important in their context where many applications will be using the same system.
Building a distributed system where no writes are rejected seemed novel and had key role in maintaining availability.

Critiques
As they mention in the paper, using a DHT, even though gives them scalability in terms of easily adding or removing nodes, it limits their scalability as they have chosen that each node knows about all other nodes in the DHT. This is very expensive and cannot scale to thousands of nodes.

Giving the application developers flexibility to decide on the right value for tunable parameters or the right strategy for reconciliation is an advantage. But this might be challenging as one should know about the existing applications and the parameters that they use in order to be able to make reasonable comparisons and choose the right strategy for her application.

They do relax consistency guarantees in order to achieve high availability. Therefore, Dynamo is not suitable for systems which require strong consistency requirements such as banking systems.

Also, I believe they could have done a better job on their evaluation section. They do not fully experiment all of the target goals that they try to meet. For example, the evaluation is done using a homogeneous set of servers while they set heterogeneity as a target in their system design.

Applications in the real world
The fact that Dynamo is implemented and in use by Amazon shows a real world example of using this highly-available key-value store. Any other application which can relax consistency guarantees in order to achieve high availability and makes simple key-value queries can be a candidate for using Dynamo.

Posted by: Fatemah | February 28, 2011 01:56 PM

Summary:
This paper presents the design and implementation of Dynamo, which is a highly available key-value storage system.

Problem:
The problem authors solve is how to design and implement a key-value store system to fulfill requirement from availability, consistency, scalability, flexibility and performance.
a. Availability: updates are accepted by Dynamo as many as possible.
b. Consistency: all updates reach all replicas eventually, and some solution must be provided to resolve update conflicts.
c. Scalability: when adding nodes to Dynamo, there should be minimal impact on both operators of the system and the system itself.
d. Performance: A response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second.
e. Flexibility: Each service can tightly control the trade-off of different features in Dynamo.

Solution:
Dynamo is designed based on the following assumption:
a. simple read and write operations to a data item that is uniquely identified by a key;
b. all data are in highly available, and data conflicts can exist, but some solution must be provided to resolve conflicts;
c. Dynamo is used only by Amazon’s internal services. Its operation environment is assumed to be non-hostile and there are no security related requirements.
d. Dynamo is running on commodity hardware, and hardware failures are considered as common situation in this context.

Dynamo uses a synthesis of well known techniques to achieve its design requirements:
a.Dynamo relies on consistent hashing to distribute the load across multiple storage hosts.Dynamo adds virtual nodes, not actual nodes into the hash value ring.
b. Each data is replicated at N hosts;
c. consistency is realized by object versioning;
d. R, W and N can be configured by each service;
e. using hinted handoff to ensure that read and write operations are not failed due to temporary node or network failures.
f. using Merkle to detect the inconsistencies between replicas
g.failure detection and membership updates are realized by a gossip based protocol.

Flaw:
a. Dynamo assume there is no malicious code. But unskilled programmer may produce some buggy codes, which may crash the system.

b. Dynamo sacrifice some consistency to provide high availability. It cannot be used in some situations where high consistency is required, like stock and bank system.

Relevance:
a. storage requirement of different services are abstracted as key-value storage. This reduces the difficulty of designing storage mechanism.

b. From the aspect of users' experience, Dynamo designers feel that availability of update operations is more important than consistency. So designer sacrifice some consistency to improve users’ experience.

c. Designers admit that there will be some conflicts that cannot be solved by system, so they export these conflicts to clients and let clients participate into the process of conflicts reconciliation.

Posted by: Linhai Song | February 28, 2011 08:05 PM

Summary:

This paper presents Dynamo, a data storage system that provides high availability and eventual consistency using consistent hashing for load balancing and fault tolerance, a relaxed form of quorum and anti-entropy for replication, and vector clocks to implement object versioning. Dynamo also pushes some of the system functionality to the application such as resolving the differences between two inconsistent object versions.

Problem Description:

This paper proposes a solution to the problem of designing a highly available data storage system where providing good performance for 99.9 % of responses is more important than overall consistency. This problem is important because it represents a problem space that Amazon and other e-commerce sites need to solve in order to avoid large financial losses due to unavailability. As the paper describes, a solution to this problem involves a significant amount of fine tuning for each specific application, and there are many trade-offs depending on the application requirements.

Contributions Summary:

This paper explains a large number of applications of existing ideas to solve the problem; I describe a few contributions I found interesting.

Dynamo uses Merkle trees to do anti-entropy replication. A Merkle tree allows the efficient comparison of slightly dissimilar collections. That is, the time required to compare two collections is proportional to the number of differences between the collections. Dynamo utilizes this anti-entropy technique for more coarse-grained replication required to tolerate long-lasting failures.

As the authors describe, a decentralized failure detector is not necessary for applications that fit Dynamo’s architecture. They explain that an initial version of Dynamo used a decentralized failure detector that was overkill and also that failure detection could be obtained by combining local failure detection (such as communication timeout) with membership information replicated used in consistent hashing.

I found the notion of an application and the storage backend having an SLA particularly interesting. The SLA allows an application to place specific requirements on the backend. The backend is then bound to provide this level of service to ensure that the application behaves as originally designed. I think this approach cleanly separates the duties of the backend and application, allowing each component to optimize its own operations instead of optimizing around bottlenecks in the other.

Shortcomings:

I think requiring an administrator to add or remove a node from a Dynamo ring could pose problems for rapidly scaling to support unexpected spikes in popularity of content. For Amazon, they have a pretty good idea of peak demand and I don’t think they have to deal with a sudden spike in popularity that is unexpected. However, if Dynamo is applied in other contexts, this could no longer be the case.

Application to real systems:

I think this paper (and the design of Dynamo) provides excellent background for designing a high available web service. However, applying a storage system design similar to Dynamo will be constrained by the demands of the application; in certain domains such as banking, eventual consistency cannot be tolerated.

Posted by: Dan McNulty | February 28, 2011 11:31 PM

Summary
The authors present Dynamo, Amazon’s highly-available key-value store, used for services with high reliability requirements. Using a variety of DS techniques, like consistent hashing, vector clocks, and configurable read- and write-set sizes, Dynamo is able to provide high availability and scalability, while allowing users to configure replication levels for application-specific needs.

Problem
Distributed systems face important trade-offs between consistency, availability, and partition tolerance (CAP). Given that partitions are inevitable in this environment, the tradeoff is really between consistency and availability. Systems that guarantee consistency may not always be available, since partitions may prevent writes to protect consistency after a merge. Systems that guarantee availability risk exposing stale or conflicting data to the user.

Contributions
The authors present a system that is highly available and uses a model of eventual consistency to balance the CAP tradeoffs. Consistent hashing is used to pick nodes on which data will be replicated, and vector clocks are used to version control data instances. Data is replicated in a user-configurable manner (the number of R and W replicas is customized, not their locations), and they use virtual nodes to help with load and data balancing.

Two important design considerations are described: when to resolve conflicts (they choose to do this on read, so that writes always succeed) and who must resolve conflicts (they allow either the data store or the application to do this, though they suggest that the data store will more likely be responsible for it). These two choices illustrate important options in a distributed system and demonstrate ways by which Dynamo is able to be highly available.

Flaws
A limitation instead of flaw, Dynamo is designed to exist in a non-hostile environment. Although they acknowledge this as a central design parameter, this suggests that such a system might require extensive changes to allow for necessary features, like authorization and authentication, in a more diverse environment.

Relevance
Dynamo provides an demonstration of the trade-offs between consistency and availability. Because it operates in an environment in which strict consistency is not necessary, they are able to leverage an eventual consistency model to provide availability almost all the time. Not only does Dynamo provide a nice example of an environment that can tolerate eventual consistency, but it also shows how configuration options, like read- and write-set sizes, can help control the level of consistency provided. Further, this system nicely synthesizes a variety of DS concepts, including consistent hashing and vector clocks, showing how these models can be used together in a system.

Posted by: Emily Jacobson | March 1, 2011 09:45 AM

Summary:
Dynamo, a simple key value storage system with an aim of providing reliability and “always-on” experience to user.

Problem:
Amazon runs an online application where reliability is a major concern - it cannot go down even for seconds which can lead to loss of customers. In this environment there is a particular need for storage technologies that are always available and scalable. Also being a vast application it requires an infrastructure of tens of thousands of servers and network components located in many data centers around the world. At this scale, small but significant number of server and network components fail at any given time. So it is necessary for the software to maintain reliability that is able to work around hardware failures (servers going down either temporarily or permanently, tornados, power outage, failure of commodity machines) seamlessly. The main goals of the present storage system - dynamo are : 1) Always available and writable (even in situations of network partitions and concurrent writes) and 2) Minimum service latency requirement. It achieves these goals by - 1) available and always writable - by partitioning the data using consistent hashing to uniformly distribute data across the data centers, object versioning and use of vector clocks for reconciliation of conflicting data 2) Consistency : by having quorum like technique and a decentralised replica synchronisation protocol 3) Minimum latency - by using distributed hash table to get data location in constant number of hops 4) handling temporary failures - uses sloppy quorum and hinted handoff protocol 4) handling permanent failures : uses anti-entropy and merkle trees to synchronise the data. 5) Failure and membership change : uses gossip based protocol to get the system state.

Firstly I don’t see this as a research paper. So I don’t see any new contribution to the research world of distributed systems due to this paper. This paper is an experience by people who tried to intelligently put different ideas to make an online storage system to work according to their needs.

Experiences from the implementation of distributed storage system

1) Leave the decision to application as to how it want to deal with the conflicts. Since consistency and availability cannot be achieved simultaneously in face of partitions it is good to leave to applications to decide what is required for their proper working .Although it is losing transparency ,I feel it is a good decision.

2) To provide “always writable” semantics - conflict resolution has to be done during read time by either coordinator or the client side library. Provide reconciliation technique to resolve conflicts among different divergent copies of data

3) Any distributed system design should embrace a) incremental scalability - need to scale with minimal impact on existing system b) symmetry : every node in the system should have same responsibilities c) decentralisation : no centralised control which can become single point of failure d) heterogeneity : need to exploit heterogeneity in the system which takes care of high capacity systems

4) It will be a good Optimisation to deal with temporary and permanent failures differently. Also use an external mechanism( ex seeds) to initiate addition and removal of nodes from the system.

5) Read repair: Since client or coordinator receives data during read so it takes this an opportunity to repair the stale data rather than depending on some external synchronisation protocol.

6) Availability and Durability does not go hand in hand. Since durability can be increased by increasing W which can limit availability.

7) Average performance measures are sometimes not good predictors. The paper takes a good decision of providing the absolute performance data.( 99.9% of all data reads and writes are served with in 300ms of time) Its always good to base your metrics on the way a user experiences your system.

6) Decoupling partitioning and placement is the right design choice for consistent hashing.

Dynamo achieves good performance despite the fact that Amazon's system is constructed from standard commodity hardware components that have far less I/O throughput than high-end enterprise servers and is characterized by huge swings in demand.

Flaws

Minor concerns : Lot of storage and communication is required to maintain membership information. Also it may be difficult to get the values of W and R which the application requires.

Applications :

As the paper uses caching which always improves system performance, there can some caching done at the client or coordinator side. Similar to ‘Read Repair’ it is always to use opportunistic behaviour to make system consistent rather than depending on some other external protocol. Also the concept of “Durable Write” : which is necessary and enough to have good performance of the system can be used in consistent critical applications.

Posted by: Pratima Kolan | March 2, 2011 08:22 AM

Summary:
The paper presents the design of Dynamo, Amazon's highly available distributed key value storage system which achieves great degree of availability by sacrificing consistency. The paper discusses the multitude of techniques employed by Dynamo that enables it to achieve desired levels of availability, reliability, performance and scalability.

Problem:

Traditionally systems have used relational databases for storing huge amount of data that is inherently simple pair of key and value. Since RDBMSs aim to achieve high degree of consistency through synchronized updates, such systems cannot provide very high degree of availability. This, in turn, is a non-scalable solution. Dynamo trades consistency to achieve availability at 99.9th percentile of distribution.
Contributions:
Dynamo shows that by relaxing high consistency requirements, it is possible to achieve high availability and that too without losing data at any point of time. Through extensive use of object versioning, for any updates of data at any replica, assisted by client-controlled semantic reconciliation, Dynamo virtually eliminates data loss events in the system.

Rather than creating a single platform that universally provides strict performance guarantees, Dynamo allows tuning of system parameters(N, R and W) by client services to achieve desired system characteristics.

Dynamo combines numerous techniques to achieve high availability with high performance and reliability. It employs
virtual node abstraction for load balancing in consistent hashing,
gossip based distributed failure detection and membership protocol,
merkle trees to minimize overhead of replica synchronization, partitioning of key space for quick bootstrapping and simplified key space transitioning, global view of partition location information to improve latency of service, replication across data centers for high reliability, admission control techniques to keep background activities from interfering with more important foreground request processing operations.

Flaws:
Even when updates are not happening (or not so frequently) in the system, no replica can independently service any read requests without consulting a set of other replicas. Thus reads are always costlier as they involve network transactions. Thus despite having multiple consistent replicas, they cannot be independently used to provide the kind read performance and load balancing as would be expected with ideal replication.

Applicability:
The multitude of techniques used in Dynamo makes it highly available at 99.9th percentile of distribution. Popular web services, where usually the overwhelming amount of data processed is in the form of key-value pairs, have to cater to millions of customers during peak hours which requires them to make their platform highly available to provide satisfactory performance to almost every user. Incremental scalability feature of Dynamo also allows such systems to be cost effective by using the available resources judiciously.

Posted by: Sandeep Dhoot | March 2, 2011 10:55 AM

Summary:
The paper presents the design of Dynamo, Amazon's highly available distributed key value storage system which achieves great degree of availability by sacrificing consistency. The paper discusses the multitude of techniques employed by Dynamo that enables it to achieve desired levels of availability, reliability, performance and scalability.

Problem:

Traditionally systems have used relational databases for storing huge amount of data that is inherently simple pair of key and value. Since RDBMSs aim to achieve high degree of consistency through synchronized updates, such systems cannot provide very high degree of availability. This, in turn, is a non-scalable solution. Dynamo trades consistency to achieve availability at 99.9th percentile of distribution.
Contributions:
Dynamo shows that by relaxing high consistency requirements, it is possible to achieve high availability and that too without losing data at any point of time. Through extensive use of object versioning, for any updates of data at any replica, assisted by client-controlled semantic reconciliation, Dynamo virtually eliminates data loss events in the system.

Rather than creating a single platform that universally provides strict performance guarantees, Dynamo allows tuning of system parameters(N, R and W) by client services to achieve desired system characteristics.

Dynamo combines numerous techniques to achieve high availability with high performance and reliability. It employs
virtual node abstraction for load balancing in consistent hashing,
gossip based distributed failure detection and membership protocol,
merkle trees to minimize overhead of replica synchronization, partitioning of key space for quick bootstrapping and simplified key space transitioning, global view of partition location information to improve latency of service, replication across data centers for high reliability, admission control techniques to keep background activities from interfering with more important foreground request processing operations.

Flaws:
Even when updates are not happening (or not so frequently) in the system, no replica can independently service any read requests without consulting a set of other replicas. Thus reads are always costlier as they involve network transactions. Thus despite having multiple consistent replicas, they cannot be independently used to provide the kind read performance and load balancing as would be expected with ideal replication.

Applicability:
The multitude of techniques used in Dynamo makes it highly available at 99.9th percentile of distribution. Popular web services, where usually the overwhelming amount of data processed is in the form of key-value pairs, have to cater to millions of customers during peak hours which requires them to make their platform highly available to provide satisfactory performance to almost every user. Incremental scalability feature of Dynamo also allows such systems to be cost effective by using the available resources judiciously.

Posted by: Sandeep Dhoot | March 2, 2011 10:56 AM

Summary: Amazon's Dynamo system manages the storage and retrieval of values associated with keys that are distributed across a number of replicated servers. This paper describes the philosophy and implementation of Dynamo and how the system evolved from earlier versions in order to realize optimizations based on use cases and workload trends that were observed.

Problem: The primary focus of Amazon in the implementation of Dynamo was availability, such that crucial business applications are able to remain operational regardless of failures in individual nodes. That said, they also made a point to allow tuning of the service to facilitate different use cases that may be more dependent on high performance or strict(er) consistency. The fact that Dynamo is only directly used by trusted, in-house client gives the developers the freedom to ignore the possibility of malicious or malfunctioning clients.

Contributions: The ability of Dynamo to be adjusted for a wide variety of use cases while maintaining a consistent and uncomplicated backing store is its largest triumph. In addition to the N, R and W parameters which allow the performance, availability and durability of the information to be appropriately balanced, Dynamo also provides the opportunity for clients to implement their own business logic for reconciling inconsistent data. They also describe a slew of optimizations, some of which could only arise from an observation of an implemented system in active use. For example, the authors mention how the coordinator for a write request "is chosen to be the node that replied fastest to the previous read operation." This opportunity, which aids both performance and consistency, would probably only be recognized in the context of an in-use system.

Flaws: Dynamo does have some limitations, primarily regarding the nature of the data that can be managed with the system. Admittedly, the database has no relational capabilities and is limited (in practice if not in theory) to storing small values. It also appears to be a completely flat database without a concept of namespaces. This requires a high level of coordination between client development teams to avoid naming conflicts, which is reasonable to achieve within a single company like Amazon but may limit the applicability of the system to a broader audience.

Applicability: Dynamo is of supreme relevance in the arena of distributed computing because it represents the current state-of-the-art. It synthesizes a variety of techniques developed in academia and shows that they can be put to practical use in a functioning system of the largest size. It's simple yet multipurpose and tunable nature assures that it, or systems modeled after it, will be utilized far into the future.

Posted by: Rich Joiner | March 2, 2011 01:04 PM

Summary:
This paper describes Dynamo, Amazon’s distributed key-value store that emphasizes availability and partition tolerance over consistency. The system employs a variety of techniques, including consistent hashing, vector clocks, and update quorums, to achieve its goals.

Problem:
Amazon’s business model requires its services to effectively be “always-on”, even though they scale to tens of thousands of servers, exposing their services to constant failure of varying degrees. In order to ensure high quality of service, the paper seeks to design a key-value store that has low latency in the 99.9th percentile and can tolerate significant failures while still retaining reasonable degrees of consistency.

Contributions:
Rather than anything particularly novel, this paper presents a study on how to combine a variety of techniques into a successful, coherent system. The basic organization centers around a consistent hashing ring, with read and write duties divided into loose quorums based on the first N nodes after the hash of the key value being accessed. There are a number of ways for dealing with failure for this, including temporary hand-off to healthy nodes and an anti-entropy mechanism using Merkle trees. Dynamo uses vector clocks to track differences in versions (punting to the client if unresolvable conflicts emerge), and both external and internal failure detectors to keep track of nodes in the ring. The paper does a good job of describing how all of these actually fit together and their interlocking implications. It also shows that, with all of these components, Amazon is able to achieve the performance goals it sets.

In section 6, the paper does offer some novel techniques as optimizations, such as write buffering to sacrifice perfomance for durability, and several different means of distributing node responsibility in consistent hashing. Most of these changes are necessary to achieve Amazon’s performance goals within their framework, though they may be applicable to other designs of distributed systems.

Flaws:
On the whole, Dynamo works quite well for its targeted workload and environment. However, it is very much optimized for that, and may not be very portable. For instance, its relaxed consistency would make it terrible for, say, a financial application. Furthermore, it is targeted towards systems with large numbers of nodes that seldom permanently leave the network. Given this, the consistent hashing and quorum scheme may be overkill in situations with noticeably fewer nodes, and it may not work well in peer-to-peer contexts, given that completely new nodes often enter and nodes will permanently leave with relative frequency.

Applicability:
Despite the above reservations, the Dynamo model seems relevant to a broad variety of web services today, from online stores to services like Facebook. Even if they don’t directly use Dynamo’s framework, the general idea of optimizing for availability over consistency seems to be the dominant paradigm in the Internet today, and one that is unlikely to go away any time soon.

Posted by: Chris Dragga | March 2, 2011 02:16 PM

Summary

The paper describes Dynamo, a scalable, distributed key-value store that is designed to have very high availability, but has a relaxed eventual consistency model for stored data. Dynamo combines many ideas from distributed systems research into a highly available and useful production system and provides a good example of the implementation decisions and tradeoffs associated with building a distributed system.

Problem

Many services at Amazon require data storage which is reliable, always available (with low latency to respond to interactive user requests), and scalable to vast workloads. To achieve the reliability and availability requirements, the system must tolerate transient or permanent failures of nodes, or even entire datacenters. The system must also facilitate scaling by allowing nodes of heterogeneous capacity to be manually added or removed and automatically rebalancing data across the system.

Mindful that the CAP theorem and stringent latency requirements prevent a strong consistency model from being achieved with Dynamo, the authors adopt an "eventually consistent" model. Under such a model, conflicting updates are possible, and it is left to applications to resolve them on read.

Contributions

Dynamo combines several ideas from previous distributed systems to achieve its design goals.

Data is partitioned and replicated using consistent hashing (replication is done to an object's N successor nodes, with N as a configurable parameter). To account for nodes with differing capacity, each physical node can appear as a variable number of "virtual" nodes in the hashing ring (replication avoids placing multiple copies on the same physical node, and ensures that some copies are placed in different data centers).

Object versioning is done with vector clocks, which can be used to detect when divergent versions of an object inhabit the system (multiple versions are returned to the client for resolution in this case).

Consistency among replicas is maintained via a quorum-like protocol with tunable parameters R (the number of nodes which must participate in a sucessful read request) and W (the same for write requests). With R + W > N, Dynamo acts like a quorum system, although in practice lower numbers are often used to achieve better latency.

Transient failures are handled via hinted handoff, where nodes "filling in" for failed nodes keep hinted replicas that were originally meant for the failed node in a seperate local database. Upon detecting that the failed node has recovered, hinted replicas are delivered to the now-recovered node. To handle permanent failures (wherein hinted replicas become unavailable before they can be delivered to the originally intended node) anti-entropy is used to synchronize replicas. Merkle trees are used to efficiently compare data between nodes.

Membership and failure detection are handled via a gossip protocol. Every second a node randomly contacts another and they reconcile their membership change histories.

Flaws

The weak eventual consistency of Dynamo is not appropriate for all applications. Amazon chooses to resolve the occasional real-world "glitches" that arise from this model via customer service (e.g. giving free coupons to those affected), but such strategies aren't acceptable for government agencies or banks, for example.

The KV-store model is also not well suited to applications, such as data analytics, which require access to many data objects in patterns for which primary-key-only access is not efficient. Despite scalability issues, an RDBMS which allows more flexible indexing and aggregate operations is a more appropriate choice for such applications.

Finally, Dynamo is designed for a well-controlled environment with no hostile nodes, and as such is not built to handle Byzantine failures. One presumes that Amazon has tight procedures for network administration and security, but it is worth remembering that such necessary measures are not addressed by Dynamo itself.

Conclusion

Dynamo has clearly achieved its design goals very effectively for Amazon, and is widely used within the company. The tunable parameters N, R, and W give it some flexibility for handling different application requirements.

Posted by: Craig Chasseur | March 2, 2011 04:41 PM

Summary:
This paper presents the design and implementation of Dynamo, a distributed key-value storage system that achieves high availability while sacrifices some consistency. The Dynamo system makes use of a combination of well-known techniques: consistent hashing to partition and replicate data, vector clock to detect confliction, quorum-like protocol to achieve consistency, gossip protocol to maintain membership. The new contribution of this paper is that they found a particular point in the design space (consistency, availability and partition tolerance) which is useful in real enterprise environment.

Problem:
How can we build a distributed key-value storage which meets the need of real world applications? What tradeoff should be made in terms of consistency, availability and partition tolerance?

Contribution:
1. The main contribution of this paper is to find the sweet spot in the design space of a distributed key-value storage system that best accommodates the needs of commercial applications. The authors recognize the extreme importance of availability for business, and sacrifice consistence guarantee for that.
2. The data versioning Dynamo uses enables the system to detect conflicts, reconcile them if possible, and hand them to the business logic otherwise. The decision of partially push the responsibility of reconciling conflicts to the clients doesn’t assume any conflict reconciling logic the clients expect thus ensures flexibility.
3. The combination of hinted handoff and anti-entropy for replica synchronization is something I didn’t see in other literatures.
4. For the ring membership management, it’s interesting to notice that an explicit addition and removal protocol better suits their need than a dynamic, failure-detection based approach; given that outage is often transit.
5. The idea of having configurable R, W and N to enable application to make their own tradeoff over availability and consistency is also interesting.
6. The separation of data portioning and data placement when doing consistent hashing is a new observation.

Flaws:
As many industry flavor papers, the evaluation section is sloppy. It’s basically saying that Dynamo works great in their environment, and nothing else. It would be desirable if they characterize their environment and applications more so that we have a better understanding on what Dynamo is good at. Also, the comparison against similar systems is lacking.

Relevance:
The fact that this paper is written by Amazon almost guarantees its applicability to the real world. However, since we don’t have a detailed description of the applications Dynamo is serving, one can only guess that Dynamo is more suitable for a business model where inconsistency can be solved out of band (say email a client later saying that the item she ordered is out of stock).

Posted by: Suli Yang | March 2, 2011 04:49 PM

Summary
The paper describes Dynamo, a key-value store used by Amazon.

Problem
The authors need a key-value store which is very highly available, even at the large very large scale that Amazon.com operates at.

Contributions
The authors show how they can trade consistency for availability. They briefly discuss service level agreements, and how to make availability guarantees in a complex system. They argue that it is more important to look at the performance at the 99.9th percentile, rather than the average. They show how their system performs in practice, and that inconsistencies are rare. They also give a strategy for an efficient way to distribute tokens in consistent hashing.

Flaws
One of the problems with using a completely symmetric system and with consistent hashing is how to deal with popular keys. The authors do not say much about this other than that they can increase the replication amount for the entire system. Perhaps in Amazon’s case, they can set up their databases so that the keys are fairly uniformly access, but this is not clear.

Discussion
Dynamo is used by several of Amazon’s core services. The main design consideration in Dynamo is trading consistency for availability. For non-essential services, like product recommendations, it is fine to give stale or inconsistent data. Dynamo is also used for the shopping cart, and the authors said that inconsistencies are rare for that service. This makes since because each users cart is independent and a single shopping cart is unlikely to have many concurrent updates. So I think a system like Dynamo would work well in cases similar to Amazon’s; where each user of the service has a largely independent experience.

One of the advantages of Dynamo is that it is very flexible. Each of the replication settings can be tuned for the specific workload. Additionally, the pool of servers can be grown one node at a time.

Posted by: Aaron Brown | March 2, 2011 05:16 PM

Summary: Amazon's web platform does not require the complex query functionality found in relational databases, but must scale to levels beyond what such databases can easily provide. The distributed key-value store Dynamo fills this gap and can scale to hundreds of nodes with tunable parameters to balance consistency and availability.

Problem: Major web sites such as Amazon operate at such a large scale that storing all data in a single monolithic database is impractical. Instead, functionality is divided into distinct services that store data, and presentation logic combines data from services. Consequently, many individual services are simplified such that they only require key-value semantics from their data persistence layer. However, services also have extremely high availability requirements in order to provide high availability for the system as a whole. Not must reads of the data always be possible, but writes to the store must always succeed - even when the network is partitioned and inconsistencies in the data may occur. Typical relational databases place a high value on consistency, in direct opposition to this goal. Instead of creating a new system to address these problems, other web sites have used relational databases as key-value stores, such as Facebook with MySQL.

Contributions: Dynamo provides scalability with a variation on consistent hashing such that each node can directly determine where to forward a read or write request, which minimizes latency and allows node membership changes with minimal disturbances on the overall hashing system. A crucial difference between Dynamo and similar systems is that Dynamo always allows writes. To handle concurrent writes on separate nodes, Dynamo stores multiple versions of a value. Using vector clocks, Dynamo may be able to determine which version is the most recent when nodes reconcile their differences. Otherwise, multiple versions are presented to the user on a read operation. Applications can tune the relative performance and consistency of the data by adjusting the numbers of nodes that must respond to read and write operations, in a variant of quorums. When the write parameter is low, data is more likely to become inconsistent. Using an anti-entropy protocol, inconsistencies are eventually reconciled.

Flaws: Dynamo places significant responsibility for ensuring the consistency of data on the application. Given that the data store really is not always consistent, it is natural that applications have varying expectations to resolve errors. But perhaps there could be a series of common failure models for applications to choose from, rather than requiring each new application to start from scratch. For example, the version control system Git has several modes for automatically merging conflicts, even though some must still be manually resolved.

Large numbers of conflicts can be caused by automated clients making simultaneous requests, though the paper is not clear whether these robots are inside or outside the trust boundary of the system. If outside users can indirectly cause this behavior, this could be a security vulnerability leading to either poor performance, or worse, application-specific inconsistent behavior (such as large numbers of spurious shopping cart entries).

Applicability: Dynamo places availability above consistency in general, so it is applicable to environments where occasional inconsistencies are not a problem. If too many items are placed in a shopping cart or order, those items can be returned by the customer. Other applications have a much lower tolerance to inconsistencies and should never be hosted on this type of database system. For example, the core databases behind Epic Systems' medical records products are run on large mainframes.

Posted by: Todd Frederick | March 2, 2011 08:21 PM

Summary: Amazon's distributed key-value store, called Dynamo, is designed for high availability and low latency. Keys are assigned to nodes using consistent hashing and replicated at the next N nodes in the ring.

Problem: The problem Dynamo seeks to address is Amazon's need for a highly available key-value store. In an infrastructure with millions of components, dealing with failures is the norm. Applications should be able to read and write values to the key-value store regardless of partitions or node failures. Furthermore, Amazon has strict latency requirement. Performance is measured at the 99.9th percentile to ensure almost every single read or write executes in less than a few 100ms. Achieving high availability and low latency requires efficient replication and load balancing, both of which are challenging to achieve at Amazon-scale.

Contributions: To address the need for both high availability and low latency, Amazon employs a technique known as consistent hashing. Consistent hashing treats the hash space as a ring, "mapping" keys to the ring based on their hash value. Nodes are also "mapped" to locations on the ring. A key/value pair is stored in the next node counter-clockwise on the ring from the location of the key. Key/value pairs are replicated on the next N-1 nodes in the ring. New nodes can be added to the system by "mapping" them onto the ring. Likewise, when nodes fail, they are simply removed from the ring.

To make Dynamo highly available, its designers want to always make writes possible. This allows a user to still add items to their shopping cart even while a data center is partitioned from the network. When nodes are restored to connected functioning order, Dynamo uses vector clocks to attempt to merge writes that took place during partitions. In the event there are two different version histories for an object, all versions are passed to the application for resolution. In this way, a user may have multiple "different" carts at any given time, which will eventually be consistent.

Flaws: One challenge with consistent hashing is knowing which node to contact to read/write a particular key. One option is to rely on a frontend to accept requests from applications and forward the request to the appropriate node. A second option is to accept requests at any node for any other node, and then forward the request to the appropriate node. The first method requires a central load balancer which can become a bottleneck; the second method requires nodes to do extra work other than handle their own requests. Neither is a great option.

Applicability: The ideas in Dynamo definitely have applicability outside of Amazon's enterprise services. Consistent hashing is used by many web services to distributed requests amongst multiple content distribution locations. At the same time, the ideas in Dynamo can easily be applied to a much smaller scale system to reap the same benefits without significant overhead.

Posted by: Aaron Gember | March 2, 2011 08:30 PM

Summary:
This paper is about Amazon's highly available key-value store distributed system called Dynamo. The paper discusses the designs, implementation, and lesson learned for Dynamo.

Description of Problem:
A large world-wide e-commerce business needs to serves millions of customers and it needs to be highly available, reliable, and fast especially during peak time. Availability, reliability, and speed are very important to a customers online shopping experience. Traditional relational databases do not scale easily and some services do not need the complex querying and management of these relational databases. The problem is creating a scalable, highly available, reliable, and fast key-value store distributed system that will meet the need of large e-commerce business such as the SLA requirement.

Summary of contributions:
There are many issues involved in created a highly available key-value store that is scalable, reliable, fast, and efficient. The authors uses many different techniques to try to tackle each problem:
1. For partitioning, consistent hashing is used which also provide some sort of load balancing.
2. For write availability, vector clocks are used and reconciliation of inconsistent data is done during reads
3. For temporary failures, a modified version of Quorum called sloppy quorum, which reduces the strict quorum rule, are used with the hinted handoff, which are used to recover the system when the failed nodes has recovered
4. For recovering from permanent failures, anti-entropy is used with Merkle trees which speeds up the detection of inconsistencies and minimize the amount of data transferred since each branch of the tree can be checked independently without requiring nodes to sent the entire data set
5. For membership and failure detection, gossip-based techniques are used
The authors also explore different configurations of the system by changing the reconciliation logic or the read/write quorum rule or by having the system be client- or server-driven.

Flaws:
I did not really notice any real major flaws in the system. The goal was to create a highly available key-value store system that can be used to meet their e-commerce service need and the system works. The authors mentioned that Dynamo is not focused on data integrity and security since it is build for a trusted environment. Security and data integrity my be an issue if it were not designed for a trusted environment but it seems like other techniques could be implemented to solve these issues.

Application to real systems:
I found it very interesting that the Dynamo systems combines many different techniques that we have read about to try to solve the many different issues involved in created a highly available key-store system. The system is configurable to meet different needs of a service so it is flexible and seems very useful in real system that don't require complex queries. Also Amazon is already using this system with great success so it is obviously applicable to real system.

Posted by: Kong Yang | March 2, 2011 10:37 PM

Summary

Amazon’s Dynamo is a highly available distributed Key value Store. Dynamo is built on the basic principle of consistent hashing and is an example of a real system deployed in production environment within the strict SLA requirements (Service Level agreement). The paper describes the key features of Dynamo. Dynamo provides simple key value interface and sacrifices its consistency to provide high availability to the client. Apart from this, the system uses various techniques like vector clocks for versioning of the data, quorums for reads and writes, hinted handoff of data for nodes which go down and comes back up etc.

Problem:

The primary aim of the Dynamo system was to provide a highly available key value store which can tolerate partitions. In order to achieve the goal, Dynamo followed relaxed consistency semantics and provides eventual consistency of the system. Dynamo provides a simple key-value store and does not intend to provide Database like ACID sematics.

Contributions of the Paper:

1. Most of the design decision was made based on the Application for which it was developed, primarily Shopping Cart. Such a system did not require database ACID properties and hence can live with inconsistent data (its just a shopping cart and not the final order) .However, availability was a must as the orders are placed via this service.

2. The primary contribution of such a system was it being a running in a production environment on commodity hardware with strict service latency agreement.

3. Dynamo uses a lot of distributed systems techniques like consistent hashing, vector clocks, quorum properties, anti-entropy etc.

4. Configurable read, write quorum parameters ensure any two of the consistency, availability and partition tolerance can be achieved.

Flaws:

Although, its hard to see flaws of the systems. I wonder if this is the best mechanism to achieve high availability with good performance. The system involves huge complexity. Hence, it might be worthy to see if simpler mechanisms can provide such high availability and performance.

Applications to Real Systems:

Dynamo testifies many distributed systems concepts and hence shows that such concepts/ideas can work at the scale of hundreds of nodes serving high traffic and performance requirements. Such concepts includes quorums, vector clocks, consistent hashing. Dynamo is also an example of lessons learned in production deployed system and what it takes for such concepts to work in production.

Posted by: Ishani Ahuja | March 3, 2011 12:09 AM

Summary
This paper describes Dynamo, a configurable distributed key-value used by Amazon. The paper doesn’t claim to offer many new ideas. Rather, it shows that a number of ideas generated by the research community can be combined in a real, practical system.

Problem
Most application demand some degree of consistency, but the full ACID properties provided by modern databases often mean that databases cannot be available unless everything is working properly (the network is operational, and no nodes have crashed). Availability is often very important, but since applications still require consistency at least most of the time, there is a need for systems that sacrifice some availability for the sake of availability/reliable performance.

Contributions
The paper makes a number or contributions:

1. An example of how many distributed systems techniques (consistent hashing, vector clocks, sloppy quorums, anti-entropy, etc) can be combined in a real system.

2. An argument for why the typically used metrics are not good. Basically, the users at the extremes that place the heaviest demands on the system are often the ones buying the most from the company, so metrics are taken at the 99.9% percentile rather than for the average case.

3. The paper describes a nice way that Dynamo can be configured for different applications (N, W, R settings). Most distributed systems are for specific workloads, so it’s cool that Dynamo is both general purpose and simple.

4. I think they do a nice job of noticing that the reliability of more consistent systems is not always necessary so things like Dynamo can be used, but at the same time, it is possible to build services that use a mix of both Dynamo and other storage technologies.

Flaws
It seems like they should have security/authentication baked into every level of their storage hierarchy. If someone were able to get past their higher layers, it sounds like they could start mass deletes without inhibition. The lowest levels of storage should be more componentized so that if someone were to get past the higher layers, they could only delete the stored data for a certain service or be someway limited.

Application
Amazon is actually using Dynamo, so it is definitely applicable to real systems!

Posted by: Tyler Harter | March 3, 2011 12:24 AM

Summary
This paper describes Dynamo, a highly available key-value storage system used by Amazon. The authors discuss the design decisions that needed to be made in order to achieve the high level of availability that a service like Amazon requires.

Problem
For a system like Amazon, it is very important that the services it provides always be available – any down time has significant consequences. Previous RDBMS have aimed to achieve very high levels of consistency, at the cost of performance and availability. For Amazon, availability is the most important concern and should be emphasized, even, if necessary, at the cost of other concerns such as consistency.

Contributions
Amazon combines a large number of previously known techniques in order to provide high availability of its services. It uses service level agreements (SLA) to create contracts between servers and clients which provide guarantees about performance – Amazon aims for the 99.9th percentile. Data is spread out using consistent hashing, and is replicated on some number of logically and geographically diverse nodes. Different versions of data are kept track of using vector clocks so that when partial failures occur and nodes miss updates, these differences can be reconciled. The system uses gossip-based techniques to keep track of a global view of the membership of the group. The system can also (usually) automatically detect inconsistencies and failures and react appropriately.
One of the biggest strengths of their approach is that different services using Dynamo can tailor the system to their needs by choosing parameters appropriately; for example, a service that wants writes to be easy can set a low value for W.

Flaws
I think the main weakness of this paper is that most of its techniques are very specific to Amazon's infrastructure and needs (for example, things like ignoring security or making assumptions about how the service will be used). These are perfectly rational assumptions to make in a paper like this - however, it makes me wonder how well these techniques could be used in other systems

Application
Given Amazon's success and size, the techniques described in this paper are clearly applicable to the real world. I think the idea of combining various distributed system techniques, such as consistent hashing, rumor-based protocols, and quorums, is an important one – these techniques address different concerns, and by using multiple of them a system can become very robust. It is also important to think about these problems on the scale of a network like Amazon's, as our networks are only getting bigger.

Posted by: Ben Farley | March 3, 2011 12:41 AM

Summary: This paper describes the Amazon Dynamo - a highly-available fault-tolerant, high performance, key-value store thats used by Amazon internally for various services and applications to provide highly reliable operation.

Problem statement/Motivation: Amazon wanted to build key-value store that would highly reliable (reliability constraint - thanks to their stringent and highly competitive SLA). The system had to be highly available, incrementally scalable, decentralized and requiring minimal human intervention and always writable. The were ready to trade off durability, consistency and cost for the other requirements.

It was sufficient that the system provide a simple primary-key only interface using GET and PUT for reads and writes. Also, since the system was only being used internally within a trusted network, there are security related design constraints.

Summary of contributions: There are lots of interesting takeaways from the paper in terms of both techniques and ideas. Dynamo is 'always writable' by design - this is achieved by leaving conflict resolution to read operations and to services/clients. Dynamo uses consistent hashing to partition data and resources and object versioning using vector clocks for conflict resolution. The load distribution strategy of splitting the consistent-hash space/range into partitions is an interesting idea.

Although data is replicated among N nodes and there's a co-ordinator for a given data partition, the other replicas (belonging to the 'preference list') can also handle read/write of the data belonging to the partition. This allows for light-weight hinted handoff to handle temporary node failures. Recovery from permanent failures are done using anti-entropy technique. Dynamo employs a loose quorum system for consistency. Different applications make different choices for quorum - eg. a high write throughput app/service can choose a quorum with lesser write nodes (W of R+W>N is less). Membership maintenance is handled using a gossip-based protocol. From the CAP theorem's perspective, Dynamo sacrifices consistency for availability and fault-tolerance.

Dynamo gives the freedom of picking the reconciliation logic to the application, although the Thomas-rule default is built-in. Similarly, the application can also choose client-driven co-ordination than server co-ordination. I think these decisions are really neat and make the system very flexible.

Perhaps the biggest contributions of the paper are the notions it puts forth: a eventually-consistent system does indeed perform well in a production setting. Enterprise-y hardware is not required to build a highly responsive system, you can do the same with commodity hardware too.

Flaws: For me the biggest drawback of the paper is the lack of data to understand the system more deeply and see how/whether it can be a good fit for other scenarios. Its hard to surmise the performance (say, throughput or node-to-node communication bandwidth) in terms of number of nodes (performance-per-node achievable). It is mentioned that the design trades off cost for other requirements, so I guess it might be a lavish number of nodes.

When talking about the divergent versions, the paper reports very high numbers, but this is only in an interval of 24 hours. Its hard to appreciate the performance numbers as there was no mention of how many operations take place in 24 hours and whether its significant. Or maybe choosing a bigger interval (say 30 days) would have made it better.

Posted by: Srinivasan T | March 3, 2011 12:50 AM

AMAZON’S DYNAMO- AVAILABLE KEY VALUE STORE

SUMMARY
Amazon’s Dynamo is a synthesis of many techniques like consistent hashing, object versioning ,quorum-based consistency and a gossip based distributed failure detection and membership protocol to build a highly scalable and available distributed data store.

PROBLEM BEING ADDRESSED
Amazon.com is implemented on top of an infrastructure of several thousand servers located across many thousand data centers around the world wherein there is some failure or the other at any given time(failure is a normal case). Also, the customers might angry if the service they ask for is not readily available. This had motivated Amazon to design a highly available distributed fault-tolerant key value store that ensures a high level of availability by sacrificing consistency as minimum as possible.

CONTRIBUTIONS OF THE PAPER

The single biggest contribution of the paper is how different techniques of distributed computing be combined to provide a highly available eventually consistent storage systems with good performance guarantees.

Some salient features of Dynamo:
a) For achieving scalability in a dynamic incremental environment , Dynamo uses consistent hashing where nodes joining and leaving the cluster will have only minimal impact on location of replicas
b) To provide eventual consistency, dynamo uses data versioning. It uses vector clocks in order to capture causality between different versions of the same object.
c) To increase availability and durability each data item is replicated in N nodes.
d) The paper presents a hinted handoff technique to counter temporary node or network failure
e) For synchronization between replicas, Dynamo uses Merkel trees (a tree of hashes to determine which data is to be sent)
f)Quorum based read(get) and put(write) is implemented to ensure consistency
g) A ring structure for the dynamically changing network of member nodes is presented

CONCERNS
There is no provision for handling malicious nodes or Byzantine failures. This assumption is a bit too strong. Also, as consistency is relaxed, not all applications can make use of Dynamo. Also, a thorough evaluation is not presented to us about performance of the system. However, it is indeed a great effort to combine many of the concepts to give a highly reliable system.

RELEVANCE TO CURRENT SYSTEMS
Amazon’s dynamo is a not a mere idea it is a fully implemented distributed key value store serving millions of customers across thousands of data center. Amazon’s dynamo has also inspired some projects like Project Voldemort, ideas from which are used by LinkedIn.

Posted by: Karthik Narayan | March 3, 2011 01:22 AM

Summary: Amazon's Dynamo system is a simple key-value store built for reliability and availability and low worst-case latency. The system was built to power a wide range of applications thus it was designed to be easily configurable to meet widely varying service level agreement (SLA) requirements.

Problem: Traditionally web applications have used RDBMS's to support persistent storage on the backend. However there are many cases where a database is poorly suited to the task. Complex querying and management are often not needed. Despite much work scale, partitioning, and load balancing remain difficult with databases. And most of all, most databases rigidly favor consistency over availability. For Amazon's application they neither needed the complex functionality and consistency guarantees of an RDBMS nor could they meet their SLAs with one. Instead they sought to provide a system with simpler more scalable requirements: simple primary key access, tunable eventual consistency with high availability, and efficient 99.9% percentile performance.

Contributions: The contribution here is mainly in assembling an assortment of existing techniques into a single deployable real-world system, how they can be tuned to perform well under specific usage requirements. Techniques included: 1. Partitioning of data using consistent hashing over a ring of servers. 2. Replication for reliability. 3. Asynchronous replication facilitated by vector clocks and read-time reconciliiation. 4. Temporary failures are tolerated using a hint handoff and sloppy quorums. 5. Permanent failures can be healed with anti-entropy syncing. 6. Membership and failure detection using a gossip-like protocol.

Some evaulation of their implementation is presented which leads to some performance optimizations. For example, buffering writes can significantly reduce 99.9 percentile latency. Also, using client driven coordination by linking the client against a smart Dynamo Client library can also significantly improve performance. Amazon's focus on near worst case performance is also a departure from previous work than mainly focussed on average case performance. The 99.9 percentile focus is much more appropriate when user experience is paramount.

Limitations: Clearly Dynamo is targeted toward a particular subset of applications. Anytime the querying logic for an application becomes even somewhat complex, the utility of Dynamo will be greatly diminished. But despite this limitation, its usefulness appears to be great. It would be easier to feel convinced of the success of Dynamo with a larger set of examples and data showing its performance/reliability/cost improvements. However, as was stated in the text, many specifics could not be divulged due to business considerations. Another limitation is lack of security and authentication/authorization support. These features were not needed in their closed deployment, but in a more open or multiuser deployment these would be obvious requirements.

Applications: These kinds of massively scalable, always available, eventually consistent systems seem to be the prevailing design recently. The idea of tunable consistency/availability parameters is very slick and should be adopted more widely. It doesn't get much more applicable than being deployed already and powering one of the very largest services in the world.

Posted by: Kris Kosmatka | March 3, 2011 01:30 AM

Summary
Dynamo is Amazon’s replicated key-value store, used for supporting some components of its expansive e-commerce operation. The paper first describes the motivation for the system, including the required service level agreements. Then the paper describes the techniques used to create the system architecture, and shows relative performance metrics for an actual implementation.

Problem
Because of the nature of the internet, many service applications require performance and reliability at the scale of millions of clients. Therefore, replication at some level is essential in solving these types of problems. Furthermore, traditional distributed databases provide facilities and guarantees that are above and beyond what is required for many web-oriented services. Amazon targets this class of applications, which only require primary key access to data, and can handle some amounts of inconsistency. This sacrifice of consistency buys Dynamo short latencies, particularly at the 99.9th percentile of slowest operations. This is important for guaranteeing the service level agreements necessary for a high quality user experience for all customers.

Contributions
As the authors mention, the most important contribution is the implementation of a production system architecture which combines known approaches to achieve its desired goals. The paper shows that eventual consistency can be achieved, and is useful to a large class of applications.

Aside from performance measures and evaluations of techniques, the paper also provides a number of subtle optimizations that may only become important on real systems. One such optimization is the way that missed updates are transmitted to nodes which experience temporal failures. Nodes which are covering for failed nodes are given a hint about who they are covering for, and can give the data directly back to this node when it comes online again. This increases the consistency in the database, and reduce traffic during anti-entropy. Another similar optimization is that during a read, if obviously less recent versions of the data come after an acknowledgment is sent to the client, these nodes are given the most recent value.

Limitations
The authors are fully aware that their system is only valid in the context of relaxed consistency. Specifically, it is most useful when the application has knowledge of how to merge conflicting versions of data items.

I would also like to hear more about the “bad robots,” and how they created high degrees of inconsistency.

Applicability
One important feature to the success of the project is the powerful but simple control of the consistency versus availability tradeoffs. Dynamo provides the parameters of degree of replication (N), read quorum (R), and write quorum (W). N controls the degree of fault tolerance, and R&W control the consistency vs. read&write latency tradeoff. This simple feature makes Dynamo useful for a wide range of applications.

Posted by: Tony Nowatzki | March 3, 2011 01:55 AM

Summary

This paper presents Dynamo -- Amazon's home grown distributed key-value store.
The design of Dynamo is driven by observations of Amazon's application workloads (e.g., accessing data objects only by primary keys and highly availability requirement), and by technology trends (e.g., consistent hashing and vector clocks for data versioning).

Problem

How to design a storage system that meets strict reliability, scalability, and
flexibility requirements, in order to support many Amazon's lucrative business?

Note, the flexibility requirement means that the storage system should be highly
configurable to accommodate user-level services with different characteristics,
for example, shopping carts and sales rank.

Contribution

In my view, there are two major contributions in this paper.

1. As pointed out in the paper, a major contribution is the evaluation of how to incorporate different existing techniques to provide a highly available storage system, for example, consistent hashing for partitioning, vector clocks for data versioning, quarum-like protocol for making replicas consistent, and gossip-based protocol for failure detection and membership management.

2. A Dynamo instance is highly configurable to well support the specific service running on it, for example, customized application-level semantic-based version reconciliation, tunable parameters (N, R, and W) for sloppy quorum membership, and plugable storage engines.

Flaw

I think it is not the best choice to use Berkeley DB (BDB) or MySQL as a
persistent backing store in Dynamo, because they transactional support of BDB and MySQL seems to be an overhead for Dynamo, which offset the performance benefit of relaxing consistency a bit. A lightweight non-transactional single-machine key/value store, with simple but effective indexing ability, may be a better choice.

Applicability

1. Dynamo for http://www.amazon.com
2. Dynamo-clone Project Voldemort for http://www.linkedin.com

Posted by: Wenbin Fang | March 3, 2011 02:28 AM

Summary:
This paper shares the experiences of building a platform to support huge amount of world-wide e-commerce operations in Amazon. Different from master/slave model, Dynamo is a decentralized key-value store that is highly available ("always writable") and fault tolerant in a single administrative domain. Through quorum protocol, hinted handoff, merkle tree and other techniques, Dynamo guarantees the eventual consistency of data.

Problems:
1. To support the growth of applications and e-commerce requests, the platform should be highly scalable. Any synchronization techniques over the whole system is impractical. Requests should be uniformly spread across the nodes to avoid any nodes become bottleneck of the system. Nodes should be easy to add without producing too many messages.
2. Because the platform is built upon thousands of nodes, failures happen frequently. A good way is needed to make the system highly available to clients even it experiences failures all the time.
3. According to CAP theorem, strict consistency is hard to achieve in the large-scale distributed system. How to design the system to make it as consistent as possible?

Contributions:
1. Dynamo uses consistent hashing to guarantee decentralized, load balance, and node failure tolerance. Each node is assigned to a hash value in the ring, and objects are stored in the first node in the clockwise direction in the ring. Whenever we add a node or a node fails, only objects related to the changed nodes need to move. To get a good load balance, Dynamo adopts strategy 3 that the hash space is divided into equal-sized units and every node is assigned the equal-sized hash space.

2. Dynamo replicates asynchronously. It uses vector clocks to capture causality between different versions of the same object. With vector clocks, we can discover different versions of the same object and reconcile it in the right way. What's more, quorum protocol is used to maintain consistency among replicas.
Configurable R and W settings allows users to tune their platform according to their needs and workloads.

3. Quorum protocol may not work when temporary node or network failures occur. Dynamo proposes hinted handoff method that a node from the preference list could be used to serve the requests on behalf of the failed node and keep delivering replicas to the failed node when it's recovered. For permanent failures, Dynamo uses Merkle tree to quickly detect inconsistency and anti-entropy protocol to keep the replicas synchronized.

Weakness:
1. The load balance strategy 3 requires to evaluate possible number of nodes before partitioning the ring, If nodes are far more than the beginning, lots of moves are needed to make sure each node gets the right objects.
2. It may not scale to thousands of nodes because every node needs to maintain information of all nodes in the DHT as well as markle trees of replicas.

Applications:
Although Dynamo is not public, there are many open source projects that implement the ideas presented in this paper.
Cassandra, used in Facebook, Digg, twitter few years ago combines the idea of Dynamo and Bigtable together to build a highly scalable, eventually-consistent structured key-value store.
Voldemort is another projcet used by Linkedin that makes use of consistent hashing, vector clocks, quorum protocol.

Posted by: Weiyan Wang | March 3, 2011 02:35 AM

Summary:
Amazon Dynamo is a data store service used at Amazon to build highly scalable and highly available applications. Dynamo allows applications to choose availability over consistency or vice versa by using various techniques that try to circumvent the related failure scenarios by using several proven ideas like consistent hashing, data reconciliation using version vectors, quorum based writes and so on.

Problem Statement:
Applications use replicated distribution services to scale and become more available. These services have to be consistent, available and be able to cope up with independent failures that occur frequently due to high scale of the software/hardware components used in the system. There are many ways to achieve this but no one technique can guarantee everything.

Contributions:
Dynamo tries to bring in various techniques that make a key-value data store consistent, available and scale as the demand increases. One of the important contributions is that its design is highly configurable that allows the developer to decide what needs to be favored, consistency or availability depending on the need of the application. This decoupling of behavior from implementation makes it a highly general solution that can be applied to many distributed systems problems. To achieve these goals, Dynamo leverages the most popular techniques in distributed systems like consistent hashing, version vectors, gossip based group membership schemes, anti-entropy using Merkle trees and quorum based two phase commit writes.
For most purposes it seems that Amazon favors availability over consistency and Dynamo readily makes this possible by guaranteeing eventual consistency over total consistency. This also allows Amazon to achieve performance targets for 99.9 percentile of request, thus making their service more responsive in general. This is an important contribution since such guarantees are not always possible using totally consistent database systems that sacrifice availability.
Another important contribution is the extension of the version vectors reconciliation at the application level that makes the process of providing eventual consistency more flexible across different applications.
Bonus contribution: Dynamo design very helpful for P1 :)

Limitations:
Like any genuine claim about a cool distributed system, not everything can be achieved with Dynamo. It passes the CAP theorem and can only provide more of some and some of others.

Relevance:
Dynamo seems to be part of the critical backbone used at Amazon, whose services are pretty neat. That makes this paper more relevant than any of the other papers we have read so far. I also read somewhere that SimpleDB is directly built on top of Dynamo unlike other Amazon AWS apps that use Dynamo internally.

Posted by: Paras Doshi | March 3, 2011 03:38 AM

Summary:

This paper is a description of DYNAMO a highly available key value storage system from Amazon.

Problem:

This paper aims at solving the problem of providing a highly available and scalable storage system. Traditional databases do not solve this problem as they do not scale out or provide effective load balancing. Keeping this is mind the paper explores techniques which follow an eventual Consistency model that has high availability and good performance.

Contribution:

One of the contributions that I found interesting was the way in which consistent hashing has been implemented in this system, each node is mapped to multiple locations on a ring. This provides for better load distribution. The other important thing about this implementation is that it is done with virtual nodes, so a single host can host multiple virtual nodes onto a ring, so that we can optimize the usage of a single machine. Along with this Dynamo also does replication which provides for greater availability and durability.

The other significant contribution is that the system uses a Vector clock to do its data ordering. The vector clocks help in indicating causality between different versions of the same data item. Likewise the use of timestamps to phase out old vector clocks is also useful as it helps in curbing the size of the vector clocks. Further the use of Anti-Entropy in the background is very useful in making the data consistent in due course of time.

The other interesting contribution is that this system is more focused on being an always writable data store. This falls in line with the company requirements of not missing multiples updates to the shopping cart of a customer. This is a new way of thinking as most other systems would try to deny any partial writes made to the system. This assumption pushes conflict resolution to reads which is another nice idea. Likewise conflict resolution is pushed to the application, this is because the application could best decide the experience it needs to provide the client and make custom conflict resolution policies.

The use of consistent hashing, replication and vector clocks provides for scalability, availability and eventual consistency. This combined with performance optimizations like write buffers helps in achieving the goals described in the problem.

Applications:

1. The key values store idea is useful in places that can tolerate inconsistency for a short while and where the data objects are small.

2. It can be used in places where one cannot afford to have a traditional relation database to store data due to performance reasons.

Flaws:

1. One important flaw is that asking the client to resolve a conflict implies that the developers need to have more experience with the system in order to decide the correct conflict resolution mechanism.

Posted by: Vinod Ramachandran | March 3, 2011 04:10 AM

Summary

This paper describes Amazon's Dynamo system, a highly available and scalable key-value storage system. It is extensively used in Amazon's core e-commerce services. Dynamo sacrifices consistency under certain failure situations to favor high availability. Under several years production use, Dynamo demonstrates its success of providing eventual consistent storage system.

Problem
The problem Dynamo targets on is how to build a system to satisfy the strict operational requirements of a world wide e-commerce platform. Reliability and scalability are two most important goals. But they are hard to achieve because Amazon uses highly decentralized architecture across data centers. The commodity hardware used will cause frequent failures from every possible component. Building highly scalable and available system under this environment is very challenging.

Contributions
1. This paper gives very good education about how large scale e-commerce service architecture looks like, what is the real service level agreement between customers and service providers.

2. Several important design considerations and the intuitions behind them are well explained.

3. Dynamo uses a synthesis of related well known techniques to achieve its goals. Using consistent hashing with virtual node abstraction to distribute the load. Using vector clocks with reconciliation to capture causality between different data versions. Using quorum protocol and hinted handoff to maintain consistency among replicas during failures. Using Merkle tree for anti-entropy to keep replicas synchronized during permanent failures. Using gossip-style protocol to detect membership and failures.

4. Dynamo is a real system running successfully to handle huge traffic during busy days. Evaluations demonstrate the efficiency, scalability and availability of the system.

Flaws
1. The biggest flaw I think is that the paper did not discuss how the SLA is achieved, for example, 99.9% of requests with 300 ms. There are no cost model, workload model, scheduling model to achieve this goal. Also, it miss discussions about which 0.1% of requests will fail. There may be huge differences about resource usages when choosing different 0.1% of requests because the workload is bursty and latency distribution usually has long tail.

2. It also miss discussion about how to handle requests with different priorities. For example, the request from shopping card or customers with high amount, say $500 should be given higher priority than shopping card with $5 ? How Amazon handles this variations to maximal its profits ? How this profits driven requirements will affect its whole design ? These are really interesting questions missed by the paper.

3. Dynamo's design seems very complicated and highly specialized. It is hard to use this in a different environment. Last, but not least, they did not talk about how buggy there system is ? I think it is hard to make such a monster system right. The runtime buggy, malicious, faulty behaviors are missed.

Applicability
Dynamo is running for Amazon's core e-commerce businesses. It is a real working and highly reliable, scalable system as the paper claimed. Its experiences and design considerations are still useful in similar environments. The way of sacrificing consistency for availability seems becoming more popular in modern data centers, especially for those applications which do not require high consistency, such as Google's search results.

Posted by: Lanyue Lu | March 3, 2011 06:04 AM

Summary

The paper summarizes Amazon’s Dynamo Platform, a highly available key-value storage system utilized by several key Amazon Services. It explores Dynamo’s design implementations in several key areas including partitioning, replication, handling of failures and node membership.

Problem

For many Amazon services (which demand high availability), object retrieval from a data store is limited to primary-key access. Traditional RDBMS provide an undesirable solution to this problem due to the unnecessary complexity and overhead (to handle complex queries) as well as a need for specialized/high end hardware for satisfactory performance. Furthermore, the strong consistency guarantee of an RDBMS limits the availability and scalability of the system. Consequently, Amazon’s Dynamo, offers an eventually consistent platform, which is highly available and scalable. (Additionally, the only interface provided to an application is via put and get operations, which act on the primary key of an object).

Contribution

The major contribution of the paper provides insights into the development and implementation of a highly available, eventually consistent distributed system for a production environment. The paper explores several key design implementations including mechanisms utilized for partitioning, versioning and failure handling utilized by the system, and modifications needed in order to improve performance and increase availability of the system. One fundamental modification made by the system to improve performance concerns writing of an object to disk. In Dynamo, durability of an object is sacrificed for performance, by buffering writes in all but one of the replicated nodes. A write operation waits only for W responses to indicate success, the write operation is no longer bounded by a write to disk.

Additionally, the paper explores tuning the parameters of the system, to vary the performance and availability of the system, to meet the requirements of a diverse set of applications.

Flaws

One of the major limitations of the Dynamo Platform is its restriction to select applications and a restricted production environment. In addition to being limited to applications, which perform primary-key access only, the Dynamo platform is limited to applications which either do not require strong consistency or in which conflicts can be resolved without detrimental impact to the client, as well as a trusted, ‘internal-only’ environment.

Applicability to Real Systems

Dynamo is successfully deployed in many of Amazon’s services and successfully used during peak business times without a detrimental impact to the Amazon platform. The paper offers several key insights into building a highly available system in the face of continual failures, which limit the availability or consistency of the system.

Posted by: Greig Hazell | March 3, 2011 06:59 AM

asdf

Posted by: Anonymous | March 21, 2011 07:31 PM

CS 739 - Reviews - Spring 2011

|Dynamo: Amazon's Highly Available Key-Value Store

Comments

Post a comment