CS 739 Reviews - Spring 2012: Epidemic algorithms for replicated database maintenance

« Lessons from Giant-Scale Services | Main | Time, Clocks, and the Ordering of Events in a Distributed System »

Epidemic algorithms for replicated database maintenance

Demers et al., Epidemic algorithms for replicated database maintenance, PODC 1987.

Please review for Tuesday,2/7.

Posted by Michael Swift on February 4, 2012 03:07 PM | Permalink

Comments

Summerize: This paper compares three epidemic algorithms to replicate database on multiple sites. They are direct mail, anti-entropy and rumor mongering. It uses the traffic and convergence time to evalute these method. Then it propose some mechanisms to improve these three approaches, mainly for rumor mongering and anti-entropy.
Background: Many online applications such content distribution have multiple sites and replicated data sets. These data set do not need strong consistency. They just need to provide access to read and final consistency among all sites. To provide final consistency, sites need to exchange messages, and different approaches of exchanging has different benefits and overheads. The metrics to evaluate them are traffic in network and convergence time to achieve final consistency. There are three approaches.
Direct mail: Each site send the a new update message to all other sites immediately after it gets a update to local replica. And other sites will accept the update with newer time stamp than itself. For small scale system, it works well; but if the wordload become large, the exchanged messages increase rapidly and the overload in networks cause message loss, which degrades the convergence time.
Anti-entropy: each site periodically communicate with some random other sites. It exchange the entire data set with another site. There are three ways to exchange data set: pull(get other's data), push(give data to other sit), pull-push. The finall consistency can be got with probability 1. Pull and pull-push is preferred than push due to the convergence speed. There are two improvements to this approach. First, to avoid unnecessarily exchange the entire data set. The two sites can exchange checksums first, if checksums are the same, they would not exchange data. Second, each sites maintain a recent changed list, and they can exchange the list and the old data set seperately. There two improvements are both aiming to reduce the traffic in the networks.
Rumor mongering: each site maintain the list of new updates and send each update to other sites. It will also get feedback after sending updates. If the other site already has the updates, it will become less active to send; until a site get enough feedbacks(assume k) that says the other site has the update, it will stop send update. This approach does not guarantee that the update is sent to all sites; but the residue portion decrease rapidly with k. There are several options: (1)blind vs feedback: blind is to stop sending after sending k update, feedback is to stop sending after get k negetive feedback. (2) counter and coin: instead of record probability 1/k per negetive feedback, record negetive feedback count (3) push vs pull: if there are numerous updates, pull is better to trigger useful flow; otherwise push is better.(4) Connection limition: if there is connection limition on each site, push is better, because it spread traffic to all sites; while pull get data to one site, it is easy to get congested. (5) Backup: since rumor mongering cannot guarantee all sites get updated, the system will periodically use anti-entropy to update the entire data set.
There are also other improvements: (1) Deletion and death certificates: To delete data on all sites, we cannot delete one by one, because update process will replicate data to sites that do not have the data. The paper proposes dormant death certificates. This is, assign death certificate to each data and propagate it, delete very old death certificates at most sites, and retain dormant copies at only a few sites. (2) Spatial distribution: the network is not uniform, farther sites' communication is easier to be lost. So when choosing sites to exchange, the performance is better if considering the loss rate distribution. With anti-entropy plus spatial distribution, traffic on critical links are significatly reduced. Rumor mongering with CIN topology also benefit from this.
This paper discusses the three approaches to replicate data among multiple sites, it provides theoretical analysis and simulation. It also provides improvements such as checksum, spatial distribution, etc to improve them.
This paper base on it discussio on the assumption that the networks is the bottleneck, and it cannot hold the traffic as the system scales up. But when we design the system actually, we need to some detail measurements and analysis to see whether and to what scale this assumption holds.

Posted by: Wenfei Wu | February 7, 2012 08:27 AM

This paper is about the application of epidemiological theory to distributed systems. The authors base their work on the observation that the spread of an update throughout a distributed database can be thought of as the spread of a disease and, therefore, one can use certain principles governing the spread of disease in simple, theoretical scenarios.

The paper examines three ways to effect the distribution of updates in a distributed system: direct mail, anti-entropy, and rumor mongering. Direct mail is simply directly connecting to the other systems and sending the updates and it suffers from the problem that some hosts may be temporarily down at the time of the mailing and so will be unable to receive the message. Anti-entropy is when each host occasionally selects a random other host and connects with them to resolve the differences between them using either push, pull, or push-pull. It is relatively easy to see that, with anti-entropy, every update will eventually be distributed to every site. Rumor mongering, a type of complex epidemic, is similar to anti-entropy, except that it introduces three classes of node: susceptible, infective, and removed. Sites only spread the update when they are infective; when they haven't seen the update yet, they are considered susceptible. Removed means that they have seen the udpate but are no longer spreading it. An infective site becomes removed generally when it has attempted to spread the rumor to enough non-susceptible sites (the exact criterion here is variable).

The paper also gives a method for deleting from the database. This is more complicated because the nodes need a way to tell the difference between never having seen an update and having seen the update but then later having seen a delete for the update. This is done by using death certificates which are kept for each update that has been deleted. The paper details how the method prevents storage from getting filled with death certificates and still prevents the update from "coming back to life" in a future rumor mongering run.

One potential flaw in this paper is that is it not entirely clear to me how exactly they are running their simulations. Are they running it in discrete time steps or as a continuous simulation? In my view, it ought to be the latter, since the real world systems they are trying to model are continuous.

Posted by: James Paton | February 7, 2012 08:01 AM

1. one or two sentence summary of the paper
The paper introduces several epidemic algorithms for distributing updates and achieving consistency in a database with many replicas.
For direct mail algorithm, each new update is immediately mailed from its entry site to all other sites. Although it is timely efficient, it suffers from its unsatisfactory reliability due to the possibility of message loss and originator’s incomplete information about the other database sites. Anti-entropy is extremely reliable but its update propagation is much more slowly than direct mail. Because the algorithm requires examining the contents of the database, anti-entropy can be efficient only when the exchanging process is not used too frequently. Rumor mongering algorithm is more timely efficient than anti-entropy, but there is some chance that an update will not reach all sites.
2. A description of the problem they were trying to solve
The problem the paper was trying to solve is to maintain mutual consistency among replicated database in the face of updates. And the challenge here is how to achieve good performance (less propagation time, high reliability to achieve consistency). In the paper, three algorithms (direct mail, anti-entropy, rumor mongering) are described and analyzed.
3. A summary of the contributions of the paper
The main contribution of this paper is that it demonstrates the effectiveness and efficiency of the epidemic algorithms in maintaining replicated databases. Both of the epidemic algorithms and their combinations are analyzed both theoretically and practically. Another important contribution is the idea to back up this randomized algorithm with anti-entropy, so that we eventually reach to a consistent state with extremely high probability. In addition, the ideas on dealing with deletion and resurrection, and considering the spatial distribution of sites when spreading updates are impressive.
4. The one or two largest flaws in the paper
An assumption of this paper is that the update frequency is not high, but in real world implementation, the assumption may not always hold. In addition, the idea of having globally unique timestamp cannot be guarantee.
5. A discussion of how the ideas in the paper are applicable to real systems.
Epidemic algorithms is widely used in real world, for example the rumor mongering is used in wireless mesh network routing.

Posted by: Junyan Chen | February 7, 2012 07:59 AM

This paper investigates a number of variations on randomized "epidemic"
algorithms for achieving eventual consistency in a distributed, replicated
database. It performs general mathematical analyses on a number of parameters
involved in implementing schemes such as anti-entropy (in which nodes
periodically synchronize with a randomly-selected peer), rumor-mongering (in
which a node that receives an update then eagerly tries to propagate it until
it appears to be "old news"), and combinations thereof, with the goal of
propagating updates as widely and quickly as possible while minimizing network
utilization, particularly on a few critical links.

Most of the paper focuses on an analysis of the behavior of these algorithms
in the context of an abstract network with uniform communication cost between
all pairs of nodes, exploring the effects of changing update mechanisms (push,
pull, and push-pull), open-loop vs. closed-loop rumor propagation, and
numerous other variables, with results that were in some cases somewhat
counter-intuitive to me -- for example, that pull-based propagation tends to
converge toward consistency more quickly than the push-based method.

One interesting problem addressed in the paper is that of reliable deletion,
particularly with regard to the problem of storing the explicit "death
certificates" required to keep a deletion from being reverted by a mistaken
update from an out-of-date peer. The authors effectively aggregate the
storage available on all the nodes by designating a small randomly-selected
subset of nodes to retain the death certificates for a longer (but still
limited) period of time beyond the period for which all nodes will store it.
This subset will then re-propagate the deletion if an incorrect undeletion
resurfaces, using a secondary "activation timestamp" to ensure correctness in
the case of a legitimate undeletion.

A potentially problematic aspect of this that did not seem to be addressed in
the paper, however, is the potential for catastrophic failure in the event
that the initial deletion does not reach any of the designated long-retention
set before the expiration of the initial timespan (T1) for which the other
nodes retain it. If this were to occur it would (a) imply that the update had
been propagated very poorly, and (b) then fail to suppress the incorrect
re-propagation of the value that should have been deleted. Increasing the
number of designated long-term retention nodes (or the value of T1) would
decrease the risk of this occurring, but at the cost of trading off on the
benefit of using this technique to begin with, since it would increase the
system-wide storage requirements of death-certificate retention. (Though it
seems likely that the authors simply assumed that the chances of this
occurring were sufficiently low that it was an acceptable risk, like a number
of other improbable-but-possible failure conditions discussed in the paper.)

As it relates to real-world situations though, the biggest open question left
by the paper is a general and reliable way to deal with irregular network
topologies. The non-uniform spatial distribution approach discussed in
Section 3 has valuable benefits in real-world networks where a few sizable,
highly-connected subgraphs of the network may be linked by a small number of
critical links (i.e. the trans-Atlantic lines mentioned in the paper).
However, as described in the paper, when applied to some more odd (but not
terribly unrealistic) network topologies, these algorithms significantly raise
the probability of critical replication failures. This sort of approach would
thus seem to require a great deal of careful analysis and configuration before
employing it in the context of an actual network with an irregular topology.

Posted by: Zev Weiss | February 7, 2012 06:01 AM

This paper presents some methods that can be enforced while implementing consistency algorithms for distributed databases. Maintaining consistency across various replicas of a distributed database is an important problem. The degree of consistency is determined by the needs of an application and higher degree of consistency would mean the need to sacrifice availabilty or performance.

The paper introduces various methods by which consistency can be induced into a distributed database system. They are anti-entropy based method and rumour mongering. Anti-Entropy is a robust method where the whole database content of two databases is compared and the corresponding diff is applied to make the two copies consistent. The obvious disadvantage of such a method is the overhead that is incurred to make the databases consistent. A large time will be spent in comparing the contents. Rumor mongering on the other hand is a method which takes an update, sends those updates to susciptible nodes which are waiting for updates and it continues from there on to other nodes. Rumor mongering takes more time to reach full consistency compared to anti-entropy but is more performant with respect to number of requests to be handled per unit time and the amount of network traffic it generates. Also, There is a clear stress on the purpose of anti-entropy method to run in a non-frequent manner(order of hours or days) to ensure better consistency of the system, provided the anti-entropy method can do its job in the specified time.

The paper introduces a very efficient and reasonably fault tolerant way of replicating deletes. Death certificates are used which mark an object as deleted and notifies the same object in other places of its need to be deleted. This is similar to the marking a records field in database as deleted and having a separate job delete this at a later stage. There are dormant death certificates that are lying in a set of servers which expire beyond a certain point of time but is refreshed whenever a stale copy is encountered. This is a very sensible way to implement deletion assuming an upper threshold on time for updates to be replicated.

The paper covers about spatial distributions of networks, where the network performance is dependent on the distance between the nodes and its integration with anti-entropy and rumor mongering.
Also I think there are a lot more efficient ways to do data diffs that will significantly reduce the time of operation like Merkle trees that we saw in dynamo. I am not sure if that is what the author meant or if it was row by row comparison. I guess, even with using hash trees it will still take more time.
Also, direct mail can be combined with rumor mongering by grouping nodes to many local subnets and select a leader in the subnet for delivery or conflict resolution. This approach is used in some distributed databases/datawarehouses for replica management.

Some things that are inherently understood in the paper is the presence of time synchronization services in the system. Without this, timestamp comparison might result in inaccurate systems. On the whole, the paper provides a decent overview of some methods that can be tuned to the needs of the application developer. It describes the various ways in which consistency can be achieved quick or in a less traffic-generating way, etc by employing the right kind of approach for replicating the updates.

Posted by: Srinivas Govindan | February 7, 2012 03:55 AM

Discussed in this paper are ways for migrating database updates through a distributed database environment. Specifically this paper is looking at epidemic style updates (anti-entropy and rumor mongering) and comparing these to a direct mail style update system (similar to the update system seen in the original implementation of grapevine). The authors compare the pro's and con's of the various systems and give a detailed breakdown of performance for the epidemic systems that they are testing.

Some of the key points that I took away from this paper is that the authors seem to suggest using a hybrid approach to solving the issue of how to ensure that every database in the system is up to date. The update structure that they suggest is using a random complex epidemic algorithm (rumor mongering) in combination with an anti-entropy fall back (ensures that any database that missed an update will eventually receive missing updates). This is a very interesting approach to performing database updates. Using a very lightweight but possibly lossy method for sending small frequent updates to databases initially combined with a heavyweight method for sending large but infrequent synchronization updates ensures that there are multiple copies of an update in various databases on the network while also ensure that eventually every database will have that update when a resync occurs.
There are some use cases where this may not be usable (for instance in cases where you need to ensure that every server gets an update immediately) and the space required for death certificates could become problematic and cause old updates to reappear. So in any case that 100% consistency of the database is needed this type of method would not be appropriate. However if the user is willing to trade immediate consistency for performance there could be a nice performance gain from using this system.

The major issue I have with this update scheme is that anti-entropy methods might not scale well to very large databases. Especially if the method discussed in this paper (transmitting the entire database across the network) is used. With the enormous size of some databases today this operation could be so costly that it couldn't be performed on a regular basis. There are likely methods that could reduce the complexity of this operation (such as generating hashes for sets of updates and comparing just the hashes). Without anti-entropy, rumor mongering would get progressively more out-of-sync over time (to the point where it could be unusable). The other issue that I did not see addressed in this paper (for any of the methods discussed) was how to deal with conflicting updates.

Posted by: Benjamin Welton | February 7, 2012 03:55 AM

This paper presents several methods for keeping data in sync across multiple locations. Epidemic theory is used, along with direct data transfers, to spread data updates to other locations and additional methods are used to make sure the update contagion reaches every location.

A problem with typical data updates is the complexity involved as additional data storage locations are added. Each additional location adds to the risk that one of them will miss an update, leaving an inconsistency in the system. Another problem is the possibility that an update will be lost in transit. A goal is to keep network traffic proportional to the number of data locations and if a design requires more than this it becomes impractical after a certain number of locations are added. This paper presents a design that can make sure that updates reach all locations while still remaining efficient in checking and achieving eventual consistency.

This paper shows how straight data update transfers can be combined with epidemic techniques to provide certainty that updates reach all locations while not overburdening the design when additional data locations are added which increase the number of locations an update must be delivered. Two major epidemic techniques are used by the paper. Direct anti-entropy is where two locations will be randomly connected and will compare their data for any inconsistencies or updates that one of the locations is missing. This can require large amounts of data transfer to complete a comparison. The authors discuss how check-sums could be used to reduce the amount of data that must be compared during the anti-entropy phase, although check-sums can quickly become outdated. Rumor mongering involves certain updates becoming something that a location will try to actively spread to other locations. A location with this type of update will try to randomly pick other locations and ask them if they have the update already. The location will stop trying to spread the update after failing to find any new locations needing the update for a certain duration. Since it is random, it is possible that an update will completely miss certain locations when distributed this way. One factor in these epidemic algorithms is push vs. pull design. In a push design, locations will contact others and then try to check whether the location they are contacting needs any updates. In a pull design, the location initiating the contacting will ask if there are any updates available. In a push/pull design, contain an update try to push the update towards locations. A pull design will generally cause updates to more likely reach all locations but may cause more traffic. Depending on epidemic algorithms will means that the time for consistency to be reached may require a large interval and certain methods to make certain consistency has been achieved, such as multiple rounds of the same epidemic, may be required.

The paper assumes that direct mail is the main path of transfer and that transfers cannot be expected to be completed. It mixes this together with the possibility that a location might be missing knowledge of the existence of one or more other locations. It seems as if separating these two might perhaps allow different solutions. If it were possible to be certain that a location had received an update then some of the uncertainty could be limited compared to the issue of being aware of new locations that are storing copies of the data.

Any sort of data updates that either travel along a path that could cause them to be lost or require sending to an unknown number of destinations could use these epidemic techniques. This paper shows how these techniques could be used in the Clearinghouse services for example. In addition to providing checks and updates for missing changes, the paper mentions how the rumor mongering could be used for transferring the updates directly without the need for the direct sending of the data at all.

Posted by: Daniel Crowell | February 7, 2012 03:11 AM

Summary:
This paper presents some randomized algorithms (mainly rumor-mongering) for replicated database consistency (eventual consistency) which are more robust (requires few guarantees from underlying network communication) and more efficient (e.g. reduces network traffic).

Problem:
Previous direct mail method tries to send updates to all nodes in a timely fashion. But there are several problems with it:

- Update messages may be discarded due to queues overflow.

- Destination nodes may be inaccessible for a long time.

- The source node may not know all other nodes at the time of sending updates.

- Failure detection and recovery issues are all taken care of by human, which only makes sense in small scale networks.

Contributions:
+ Developed the rumor mongering algorithm as an efficient replacement for the initial Direct Mail method for updates distribution. Anti-entropy could be used as a backup for a complete spread.

+ Identified a well-chosen spatial distribution for partner selection in randomized algorithms. It proved to be efficient to reduce network traffic overhead on a per-link basis as well as on certain critical (bottlenecked) links.

+ Came up with the idea of death certificates and dormant death certificates to improve the immunity to obsolete data at a reasonable storage cost.

Flaws:
- The performance of the epidemic algorithms highly depends on some parameters that are hard to tune and need to adapt to the network scale and update frequency, e.g. the time interval to perform anti-entropy, the probability k (or the counter) with which an infective node loses interest to spread the rumor and becomes inactive.

- The analysis and simulations mostly target their CIN topology specifically. It would be nice to provide more empirical validation of the mathematical model, and present the results in plots other than tables.

- Schemes presented in this paper are mostly probabilistic and cannot guarantee anything to its entirety. I wonder if it could really be used in mission critical applications (e.g. replicated databases)?

- Algorithms may be sensitive to the nonuniform spatial distribution and pathological network topology. For instance, push and pull variants of rumor mongering may have high convergence time in some cases.

- In the case of deleting an item, if a site is disconnected for longer time than the time of holding the dormant death certificate, then it is still possible the obsolete copy of item.

Applicability:
- The epidemic algorithm inspires a large number of distributed algorithms today, especially for P2P system. While this paper discusses it in the context of replicated database, it could also be applied to P2P data dissemination. For example, in an unstructured P2P for live streaming, the similar gossip protocol could be used. Each node could randomly select some peers to push data to or pull data from. Push+pull method could also be applied since push could dispense data in a timely manner while pull can be used as backup to get missing pieces of data.

- It would be interesting to revisit the paper using a current database system, in order to see how assumptions and constraints have changed since 1980’s because of the growth of network capacity, the evolution of network topology and database technology.

Posted by: Yizheng Chen | February 7, 2012 02:39 AM

“Epidemic Algorithms for Replicated Database Maintenance” explores the application of the field of epidemiology to maintaining a consistent view of data within a distributed, replicated database. It does this by two methods: simple and complex epidemics. Simple epidemics, like anti-entropy, will always infect the entire population, but are not necessarily the most efficient forms, especially in terms of network traffic. Complex epidemics, like a rumor system, can be significantly more efficient, but do not guarantee convergence. However, parameters can be adjusted such that the probability of failure is arbitrarily small. In practice, a combination of the two approaches works very well as a quiet, reliable system. However, deletions are a non-trivial problem in such a system. They can be solved by using death certificates, which are similar to updates that mark deletions, which propagate like normal rumors. To solve the problem of reanimation of deleted data, dormant death certificates can stick around like antibodies to destroy false data.
This paper aims to solve the problem of maintaining a consistent view in a replicated database in the face of normal distributed systems problems like failure and network loss, specifically when updating and deleting entries. Because any particular message may get lost, a simple broadcast of an update is not efficient or even sufficient, and a method of robustly spreading the update must exist. The original system used in the corporate Internet (CIN) was a direct mailing system, which was liable to fail and was rather inefficient.
The application of epidemiology to the field of reliably spreading knowledge is an interesting contribution, and makes fundamental sense: biological systems exist solely to spread their “information” as widely and as reliably as possible, and because of resource constraints, as efficiently as possible. Additionally interesting is their parametrization of the effectiveness of rumor spreading based on push/pull, coin/count, and blind/feedback, such that reliability can be arbitrarily high and reliability vs performance can be tuned. Furthermore, the concept of dormant death certificates, similar to antibodies, applies another biological solution to an informatics problem.
A noticeable flaw is they do not dwell on what happens in the case of multiple conflicting updates. Even with non-synchronized clocks, one update will eventually win, but many inconsistencies could result in the mean time. For example, machine A could update U while C updates U at the same time. If one propagates to B faster, B could then act on that information and create a new update that destroys A’s update entirely or cause other problems. The system could even briefly have two competing views of history that diverge.
This paper seems to have wide-ranging applicability to distributed systems. Modelling biological phenomena is likely appropriate in a variety of contexts; specifically any system that has a goal or reward structure similar to a real life system. Epidemiology itself could also be useful when you cannot guarantee the cooperation of any subset of your system; the plague would route around the misbehaving hosts and eventually reach every reachable machine. The gossip protocol could also be useful when gathering data from a variety of potentially unreliable sources with overlapping information without explicitly polling then entire system.

Posted by: David Capel | February 7, 2012 02:25 AM

The paper proposes a whole new set of randomized algorithms for achieving eventual consistency in a replicated distributed database system.

There is a high overhead associated with techniques trying to achieve higher levels of consistency in a replicated database systems. The authors of the paper had hit a hard limit on scalability when the system grew too large (update requests increased) and the system was overwhelmed with network traffic that tried to maintain data consistency. They required a new scalable technique to cheaply achieve consistency.

The most significant contribution of the paper is the idea of using very simple, randomized epidemic algorithms by relaxing the strict requirement for faster convergence to a consistent system state. They introduce two types of epidemic algorithms: 1. anti-entropy and 2. rumor mongering. The theoretical sound design decisions discussed in the paper are very useful and concretely reason about many problems that they encountered. The deep analysis of different variations of the two types of algorithms and the discussion on the trade-offs between the algorithms acts as a helpful guide for future replicated system designers. As a result of these analysis they propose a hybrid approach for propagating updates where they switch between anti-entropy and rumor mongering as the view of the system changes. Another major contribution of this paper is on how deletions are handled in such epidemic update protocols using death certificates. They also suggested solution for "safely" removing the death certificates as they become dormant in the system. Apart from these challenges, they also suggested techniques to gracefully spread the updates across the network and adapt to the non-uniform distribution of the network capacity.

The major flaw in the paper was the assumption about the existence of global time. Even in the early years, though there is a possibility of system time being synchronized across all the servers (using NTP like protocols) to the second. It is impossible for the system to function properly in the presence of such coarsely synchronized local clocks when there can be 10-100 updates per second (which is realistic). Also, the authors doesn't seem to acknowledge the possibility of conflicting updates and the requirement for reconciliation of such conflicting updates. Further, it would have been more convincing to see some actual results for a real system than simulations. Also there were no comparisons with other state-of-the-art deterministic update propagation algorithms that could have clearly shown the trade-off between simplicity and achieving strict consistency.

In summary, the ideas in the paper still remains relevant to modern distributed system. It is more so because even a small scale distributed system is orders of magnitude larger than what was when the paper was written and hence increasing the significance of more scalable system. Further, users are more willing to relax strict consistency requirement for higher performance.

Posted by: Venkatanathan Varadarajan | February 7, 2012 02:09 AM

Summary:

This paper introduces algorithms to synchronize a distributed database. It starts with the naive direct mail algorithm where one server potentially communicates with every other server ( O(N^2) ), can highly overload links depending on the network geometry ( a transatlantic link is given as the example ), and is susceptible to failure. Next, anti-entropy is introduced. It is more reliable, but still extremely expensive as anti-entropy updates can involve a complete comparison of two databases. Finally rumor mongering is introduced such that only updates are propagated. This is not a reliable algorithm as there is some non-zero probability that a host will not be updated by rumor mongering. It is instead backed up by an infrequently run anti-entropy.

The major contribution of this paper is, of course, to describe a system that works! It is not without it's flaws but any system that works has to be given a significant amount of credit.

Flaws:

A prime source of annoyance for this paper is lack of units. Delays ( as specified in item 3 on page 11 ), are described as 'times', but no units are included. It could be seconds, it could be YEARS ( okay, unlikely but I'm trying to make a point ).

While they do not discuss the exact mechanism for comparing entire databases, the algorithm discussed on page 8, where sites maintain a recent update list exchanging that, and if their database checksums disagree they compare their entire database. The dynamo paper discusses the use of merkle trees which could be used to improve database comparison efficiency. Merkle trees where invented in 1979 and thus have been around for eight years before the publication of "epidemic algorithms for replicated database maintenance".

The solution defined in the paper to reduce the amount of traffic on the transatlantic link uses the notion of distance. Distance is not defined. It could be network distance, or geographic distance. It does seem as if this should not be a static, but dynamic metric. Say the network has two links across the country, and one goes down forcing more load on the remaining link. It seems that dynamically increasing the network distance between the east and west coast would be advantageous.

As stated in the paper propagation delay is expected to be O(log n) with similar space requirements. Similarly critical links like the transatlantic link have more than the average amount of traffic on them. The expression given in section 3.1 states that for a uniform distribution per round of anti-entropy we expect (2*n_1*n_2) / (n_1 + n_2 ) conversations on the transatlantic link. With a non-uniform distribution that takes 'distance' into account they cut that number significantly, but also more than double the t_last value ( time for convergence ). It strikes the reviewer that the techniques described in this paper could be run on multiple levels. Instead of one heterogeneous network, it could be divided into cells with the internal nodes to the cell operating as described in the paper. Inter-cell communication could be handled by nodes designated 'border' nodes and anti-entropy would only be run in between them. Subdividing networks running the epidemic algorithms would benefit from faster internal convergence times, smaller space requirements, and trouble links ( as in the transatlantic link ) would benefit from fewer expected conversations ( ie n_1, and n_2 would consist of the number of border nodes instead of all internal nodes ). Distributing load across border servers could be done via the use of coherent caching. The cell name in combination with the variable name could be used to come up with a border server to contact.

The administrative overhead of the epidemic algorithm could potentially be rather expensive. Every server has to know about every other server. If you add a server, or subtract one how do you get that information to all of the other servers? A potential solution resembles the DNS hack described in the web caching with consistent hashing paper. Pick a random number from 1 to . Query the dns server for 'server-'. The dns server could calculate the 'random integer modulo real number of servers' and return a potential server from the list. As the DNS server receives the ip address of the querying server it may be possible to implement the spatial distribution calculation on the dns server. This would centralize distance calculations and potential lower administrative overhead.

Use of the DNS hack would push any implementation farther towards dividing an epidemic network into cells. DNS lookups across a congested transatlantic link can add latency, and add further traffic. Even with a server on either side of the link there would have to be some sort of zone transfer mechanism setup would would take time. In a cell based epidemic network updating a local cells dns server would be fast and many transatlantic link dns messages would be avoided as only border servers would have to know about other border servers.

Many of the algorithms defined in this paper require good time. For example if two separate rumors about the value of a signal variable are planted at different servers and the clocks of those two servers are not synchronized to a time source the value converged upon for that variable may not be what is expected. Obviously this is not a desirable feature, but it should be avoidable. Anti-entropy and rumor mongering involve a pair of systems communicating with each other. Over that connection a time synchronization message could be send. Given two servers ( A, B ), a message is sent from A to B with t0 ( the time of the packet transmission ). On receipt B records the time of the packet reception ( t1 ) and sends a reply back ( containing t0, t1, and t2 - the time the reply is sent ). Finally the message is received back at A who notes the time of it's arrival. Finally you can compute an estimate of the clock offset by ((t1-t0) + (t2-t3)) / 2. This is the algorithm used by the simple network time protocol ( SNTP ). Exchanges could then be time corrected.

A continued annoyance about the algorithms described in this paper is that deletes do take time. This could be a potential security issue although clearinghouse servers where designed to run on a well protected corporate network. One of the background papers suggested a mechanism that might be applicable. Weighted voting allows reads/writes to be assigned a number of votes. The votes assigned control performance for a designed workload. A similar thing could be done for delete certificates. Take for example table 3. A higher value of k results in more traffic, but a faster convergence ( ie lower T_last ) and a higher k results in less traffic, but a slower convergence. The specific value of k could be propagated along with the update.

Misc Questions:

section 6, performance evaluation for 'flexible update propagation for weakly consistent replication" states that code was written in C, runs under Linux 2.0, and requires an environment that supports garbage collection. What posix C environment supports garbage collection?

Have these algorithms been supplanted by RMTP ( reliable multicast transport protocol ), or some other reliable multicast protocol? They do talk about broadcast mechanisms, but not multicast.

Posted by: Matt Newcomb | February 7, 2012 01:23 AM

The paper introduces algorithms to achieve and maintain consistency of a database with many replicas. The main approaches take ideas from the field of epidemiology and they are called epidemic algorithms.

The problem is to keep a database (replicated at many sites) consistent during updates, while preserving a reasonable efficiency and scalability. Three algorithms (Direct mail, Anti-entropy, Rumor mongering) are analyzed towards that end. Direct mail is a naïve algorithm which may cause inconsistency, since individual sites do not have complete information and messages can get lost. Anti-entropy and Rumor mongering are randomized epidemic algorithms which promise to reach consistency eventually (or with a very high probability).

Most important contribution of this paper is that it shows the applicability and effectiveness of epidemic algorithms in maintaining replicated databases. This approach influences many future studies including modern approaches such as gossip protocol. I believe the discussion of “good epidemic algorithms” plays a very significant role in such future studies. Both of the epidemic algorithms and their combinations are analyzed both theoretically and practically. Like many randomized algorithms, the Rumor mongering has a failure probability which can be made arbitrary small playing with the parameter k. I believe playing this parameter also helps improve to configure consistency with better hardware. Another important contribution is the idea to back up this randomized algorithm with anti-entropy, so that we eventually (probability of 1) reach to a consistent state. In addition to these, the ideas on dealing with deletion and resurrection, and considering the spatial distribution of sites when spreading updates are novel contributions and worth noting.

I find the hope of having globally unique timestamp unrealistic and it can easily break up the system. Some site may think the other is “removed” (because of its wrong timestamp) although it is not. However, the authors point out this and tell the system may not work practically if we do not have a proper clock.

Concurrent updates are also not discussed well. It is not clear what if the same information is updated simultaneously in different sites. I believe in such a scenario we may end up in an inconsistent state.

Although the efforts on integrating spatial distribution information into epidemic algorithms discussed extensively, it might be interesting to further study the issue by representing it as a well defined (stochastic) optimization problem. Then, we can apply well known techniques to find the optimal non-uniform distribution for selecting the sites to spread the updates.

I think epidemic algorithms are very important since they built the foundations of gossip-based algorithms which are very efficient to spread information through large networks. Therefore, they are very effective and applied in modern distributed systems which require scalability, consistency and fault tolerance. Although it was not a big issue in those days, I believe we also should take security into account more nowadays, since a single malicious site can affect the system terribly in such an epidemic scheme.

Posted by: Halit Erdogan | February 7, 2012 12:10 AM

The paper talks about how the epidemic patterns can be applied for maintaining
consistency in a replicated database system. The consistency model targeted
here is eventual consistency, where one site handles the update and it
propagates it to the rest of its replicas. The algorithms are modelled after
how epidemic diseases and rumours spread taking into account the spatial
distribution and assuming the least from the underlying communication
framework. The service that is discussed in this paper is a global naming
service used by Xerox, somewhat similar to the DNS that we use today.

Several approaches are presented and theoretically analyzed in this paper,
mixing and matching them. For propagating an update, the paper discusses the
brute force approach of mailing every replica, using a anti-entropy mechanism
where two sites constantly check each other's database for any changes and
update themselves, and finally the rumour mongering where updates are
considered as hot rumours and they are propagated until they eventually lose
importance as negative feedback is gathered. Some amount of randomness is used
in choosing the site to be updated next. Though this is successful in
spreading the updates, they cant guarantee that a deletion will be successful.

To handle deletions, the paper uses a death certificate which models the
anti-bodies in the immune system. It is triggered whenever a site comes to
know of a stale copy of data based on time stamp. Copies of these death
certificates are stored in designated sites. The paper also discusses tries to
analyze the effect of spatial distribution on the way updates are propagated
mathematically by running some simulations.

All these systems are analyzed based on how robust each mechanism is when used
in combination with the each other and also how it reduces the load on the
links. The mathematics behind these are quite difficult to follow but the
algorithms are intuitive and simple. The important contribution of the paper
is analyzing these simple mechanisms from a mathematical perspective and
drawing valuable conclusions.

The paper however is oblivious to any sort of transient failures in the
systems. The system mentioned in the paper is not completely topology aware.
Also, the paper doesn't clearly explain what sort of consistency the system is
supposed to achieve and whether any sort of quorum is required. It is rather
primarily focussed on the effective strategies to spread updates. The model
of replication used in the paper is also not very popular today on a network
that is geographically spread across.

Posted by: Sathya Gunasekar | February 7, 2012 12:08 AM

This is a good paper. I find it similar to the consistent hashing paper in that both these papers take a theoretic concept and use it to solve a systems problem. I think this theoretic backing gives strength to the paper since the system easily maps to an an abstract concept -- but this is more opinion than anything.

The primary problem they solve is a scaling consistency in a heterogeneous data store. Prior solutions involve complex deterministic approaches or use homogeneous machines, like grapevine. The authors in this paper solve this problem using a stochastic approach that yields eventual consistency. In my opinion, the primary contribution from this paper is the formalization of random approaches such that strong statements can be made about how they converge.

The most interesting theme in the paper was the inter-play between anti-entropy and rumor mongering. The authors provided very good details and formalization of both approaches to consistency -- as well as their strengths and weaknesses. While there was very strong analysis of each of these approaches on their own, I feel their treatment of the combination of these approaches was comparatively weaker. It seems like after analyzing the performance of both approaches, the end up with the very pragmatic (as opposed to theoretically clean) conclusion of: let rumor mongering handle the bulk of the distribution but throw in a little anti-entropy to clean up the corner cases.

While it is logical, and seems to work very well, to combine the two approaches, I’m curious if this is a better idea than using one protocol which properly handles all cases. There design trade-offs here, as well as the marginal gain it would yield. But it would have been interesting if they would have had more discussion on a single protocol versus two different ones.

An interesting problem they deal with is that of deleting from a store which uses the anti-entropy approach. The discussion here focuses on the trade offs of how long it takes to reach consistency and how to know when consistency is reached. In some sense, this is almost a more interesting problem than the initial distribution problem since you have to keep the death cert until you know (or you assume) all nodes have made the delete. I feel their approach of a ‘dormant death certificate’ is commentary on how difficult it is to verify a global truth in such a system.

I feel the most raw and applicable part of the paper is the analysis of the difference between the push and pull methods of updates. This paper does a very good job of explaining and justifying the trade-offs between a push and pull (and push-pull) approach. The detail they go into about how the current epidemic state effects the efficiency of either operation is a general trade-off which could be applicable in many situations. While the use of randomness is probably a bigger contribution than the push/pull trade-off, I feel that the push/pull operates in a much less theoretic way.

Posted by: Victor Bittorf | February 6, 2012 11:39 PM

The authors present a new idea for maintaining a distributed database and present measurements to illustrate the trade-offs system administrators will face in implementing various options. This idea is based on the spread of diseases through epidemic processes to determine the distribution of updates. At the time, the competing methods relied on either a direct mailing of updates to all servers (subject to unreliable delivery) or a randomized synchronizing of databases between server pairs (extremely expensive).

The proposed system is similar to the randomized synchronizing anti-entropy in that it uses randomness in determining the recipients of an update. After a certain marker is reached, an updated node will cease to spread the new information. To save bandwidth, only changes are sent when possible so the whole database doesn't have to be transmitted. The resulting algorithm is extremely simple with only three variables determining performance on a given file. A number of design choices are also presented. The model can either push updates when received or request updates. The process can be terminated by either a counter or as the result of a biased coin flip with trigger options that may require statistics from the server on the other end of the connection. They also examine influence of network topology, in particular the existence of critical links as choke points, and changes made when servers are only able to maintain a finite number of connections.

The most interesting element is the mechanism to handle file deletion. As we cannot guarantee all copies will be deleted on a given update process, select servers are chosen to archive the deletion update. If a new update is attempted, these archived copies will return to circulation to shut the update down.

To evaluate their system, the authors test various configuration choices and measure the traffic generated, the number of servers that remain non-updated after the update process terminates, the time it takes the update process to terminate, and the average time it takes for a server to receive an update, and the time. All of this is useful information but I would also like to have seen measurements comparing the performance of this system to the competing systems mentioned in the introduction. I believe that their system is an improvement but it would be good to see something in the way of proof.

Some issues I see with what they're doing. The authors assume full knowledge of the network topology to make load balancing decisions which is not realistic in either the case of a dynamic network or a network split over multiple administrative domains. Neither were big issues at the time of the paper but have become important. Another point that remains unaddressed is the consistency of the system across multiple reads and writes. The paper "Weighted voting for replicated data" goes into great detail about the problems related to ensuring the users of the system see a consistent view of the files. This paper doesn't have any of that and it's not clear how their system responds to multiple attempts to write while the system is in the update propagation process.

Posted by: Brian Nixon | February 6, 2012 11:13 PM

This paper describes and analyzes two epidemic algorithms for propagating updates between servers by analog the process to epidemic disease. Models for epidemic disease are used to quantitatively analyze the influence of parameters on convergence time and traffic amount.

Epidemic algorithms are designed to propagate updates and keep data consistent. Several factors including coverage of the propagation, convergence time and the amount of traffic are used to measure the performance of algorithms. Several algorithm design choices including spatial distribution, feedback or not and counter VS coin influence algorithm performance. How to model these choices and find out the relationship between the design choices and algorithm performance is the main problem try to solve.

The main contribution of this paper is that epidemic model is introduced to analogize propagation of updates so that mathematical model and terminology in the area of epidemic disease can be directly used. For example, by solving the differential equations, we know the relationship between the proportion of uncovered servers(variable s) and the probability that an active individual may lose interest in the rumor after unnecessary phone calls(parameters 1/k). Based on the basic model, more algorithm design choices are discussed. Using a counter or a probability to model losing interest is not rare in design choices. We can model disconnection as certain times of unsuccessful tries of connceting or after a unsuccessful try, claiming failure with a probability. Spatial distribution is introduced to decrease the amount of traffic. More locality the distribution gives, more convergence time and less traffic are needed. All these analysis shows that the fundamental problem here is to balance coverage, convergence time and amount of traffic. Another contribution is detailed discussion of data deletion. Distinguishing activation timestamp and ordinary timestamp is very useful to support reinstating operations because expiration of the certificate and deleting data do not necessarily depend on the same timestamp.

One flaw of this paper is its experiments are all simulation. There is no data from real workload even though the simulation is based on real workload. The assumption in producing simulation workload can affect the result of algorithms. Another flaw is spatial distribution seems not really practical to use. Network topology in real world is quite complex, the model used here is over simplified to help design real systems.

Even though the term ‘epidemic algorithm’ is seldom mentioned in current systems, its idea is actually the base of a famous term: gossip protocol. Therefore, the idea of this paper continues to influence the design of current systems.

Posted by: Xiaozhu Meng | February 6, 2012 10:31 PM

This paper introduces an algorithm for keeping replicated, distributed databases
consistent (eventually) which is modified from the field of epidemiology.

The problem being faced by the authors was achieving guaranteed (or close to it)
consistency among many unreliable machines without introducing lots of unneeded
network traffic. The algorithms in place on the system were a "direct mailing"
technique, which was introduced a large burst of network traffic at once and had
no mechanism for dealing with failure, and an "anti-entropy" algorithm, which
used a uniform distribution to cause machines to send updates to random machines
periodically. This algorithm was very reliable since it converged to all
machines receiving the update, but exchanged the entire database on a machine to
check for inconsistencies, which were often the same and no action was taken.

The main contributions of the paper were the application of the rumor mongering
algorithm used in epidemiology to the spread of updates among many hosts and the
evaluation of using non-uniform distributions to guide the choice of a host to
send an update to. This application of a different field's algorithm is a novel
way to think of this problem, especially being able to cause the system to
eventually stop trying to send updates to other machines. The ability to
quantify the probability of a machine not getting an update after it has
finished its update course through the system is also very useful for tweaking
parameters like how long to keep "death certificates" for updates in existence.

I found it interesting how it really felt that the main point of the paper was
the epidemic algorithm. Yet, the effect of this algorithm was not actually used
to replace anti-entropy, but it was only simulated, unless I missed something.
The paper was mostly theoretical in nature, but an explanation of why this was
not implemented would have been nice. I am guessing they could not afford the
possible troubles of integrating a strategy like this on such a large,
widely-used network. I enjoyed the discussion on topology differences and
non-uniform distributions, but much of it went over my head. I believe this is
more my fault than the paper's fault though. However, the authors lacked full
understanding on the benefits of using these distributions.

Weak consistency is almost a must for most distributed systems that hold
persistent data, as we have been establishing with the previous papers of this
class. Clearly we want our systems to converge to a point where all data is
completely consistent without overloading the system or causing slowdowns for
users of the system. Using an epidemic algorithm seems to be a good solution,
however, it needs to be backed up by a system that guarantees that the data will
be receieved by all individuals in the system. Its random nature definitely
lays less strain on the network at one time, thus making a distributed service
usable. These are algorithms that should be considered for similar systems.

Posted by: Evan Samanas | February 6, 2012 10:22 PM

In distributed systems, maintaining consistency among all sites without compromising on availability is a difficult issue and systems often choose eventual consistency as a solution. The paper discusses methods to achieve eventual consistency by using randomized algorithms namely anti-entropy and rumor mongering and the parameters that need to be tuned to decrease network traffic and convergence time. Because of the dynamically changing nature of the distributed system where sites are constantly added or removed it is difficult for every site to keep a completely updated list of other sites. In addition to this, the unreliable nature of the underlying network makes deterministic algorithms such as primary site update unreliable. Randomized algorithms on the other hand, do not require these guarantees but create more network traffic and hence it is important to choose the right set of parameters that reduce traffic and improve convergence.

The important take away points from the paper are:
1) The paper discusses three mechanisms for propagating updates namely direct mail, anti-entropy and rumor mongering. In direct mail, a site that receives an update sends it to all other sites in its list. Though it is highly efficient with the least amount of traffic generated it is not reliable. Anti-entropy chooses two random sites and resolves differences between them, the drawback being the amount of traffic generated during conflict. In rumor mongering, the infective sites (sites with updates) send updates to the susceptible sites until they get removed but the issue lies with deciding when to remove an infective site.
2) The paper points out the important distinction between push and pull by showing that when only a few sites remain susceptible the pull mechanism converges faster than push.
3) A nice technique suggested by the paper to reduce the traffic with anti-entropy mechanism is to maintain recent update lists and exchange them before comparing the checksums. An important observation to be made here is that the choice of the time window should be large enough that most of the updates are complete by then.
4) The paper defines residue, traffic and delay as the parameters involved in designing a good epidemic algorithm and introduces techniques such as blind, feedback, counter and coin where blind and coin reduce traffic at the cost of residue and convergence and feedback and counter improve delay at the cost of higher traffic.
5) The paper suggests using anti-entropy as a backup mechanism that triggers either direct mail or rumor mongering when a conflict occurs and says that it works better with rumor mongering than direct mail because of the lesser traffic generated in the former method. However, what is not clear is the need for the remailing/redistribution phase that happens when a mismatch occurs between sites as anti-entropy itself is guaranteed to eventually reach every site.
6) As update transaction records cannot be stored infinitely at each site it introduces the problem of keeping track of items that were deleted because there is a possibility of another site with stale data to update it. This necessitates death certificates that are time stamped updates retained in the system till it can be determined that all sites are aware of it. The paper introduces dormant death certificates as a neat way of avoiding obsolete data items from getting reinstated.

With anti-entropy when a conflict between the sites occur on the same data item, it is not clear which site is considered to have the latest update since there is no update list that is maintained (apart from the recent list of updates). Even if one is maintained the question is how long should it be retained in the system especially since anti-entropy is a slow mechanism and is considered as the last resort to attaining absolute convergence. Another question is if the remailing step that happens in anti-entropy is avoided, then is rumor mongering still a better technique compared to direct mail. In general, the paper gives a nice insight into randomized algorithms with an in depth analysis of all the parameters that can be tuned to improve performance and reduce cost.

Posted by: Madhu Ramanathan | February 6, 2012 08:01 PM

Seth Pollen

This paper describes three alternatives for propagating updates through an eventually consistent distributed data store. It concludes that various forms of rumor mongering provide the most efficient propagation scheme but must be coupled with occasional anti-entropy activity to ensure eventual consistency. Several tradeoffs within rumor mongering are discussed, including the push-pull tradeoff (where the correct choice varies with the rate of new data being introduced), the blind vs. feedback tradeoff (which can reduce acknowledgement traffic), and the counter vs. coin tradeoff (which can require additional state to be maintained on each node). The paper touches on the concept of using hashes to reduce the amount of data sent for anti-entropy, but it does not discuss full Merkle trees.

The paper seems to pay little attention to the issue of establishing a universal monotonic clock for all nodes to use in timestamping their updates. It does not invoke Lamport clocks, instead relying on each node’s ability to generate globally unique timestamps that are approximately equal to Greenwich Mean Time. It would be possible, however, to easily replace this simple timestamp with a vector clock.

The paper gives extensive analysis of the effects of spatial distributions on network usage and performance, but it does not describe techniques for informing this distribution. Specifically, their distribution depends on knowledge of the network distance between each pair of nodes. Are there distributed algorithms useful for discovering this information in a changing system?

The fact that this paper’s ideas were employed in the Amazon Dynamo system affirms their relevance. In particular, the issue of conserving bandwidth on critical network links (leading to the use of spatial distributions) will remain relevant as long as LANs are faster than long-distance connections.

Posted by: Seth Pollen | February 6, 2012 07:01 PM

The paper describes several epidemic algorithms for distributing updates and driving the replicas toward consistency, including little discussion on direct mail, and detailed discussion on anti-entropy and rumor mongering. Besides the comparison of anti-entropy and various rumor mongering variations, key points in implementation, such as backup, deletion and spatial distribution of servers, are also discussed based on epidemic algorithms.

The problem this paper tries to solve is to design algorithms for replicates that are efficient and robust and that scale gracefully as the number of sites increases. Direct mail fails to solve this problem since it is not entirely reliable. The new algorithms try to trade off between cost and reliability.

The contribution of this paper is that it introduces the epidemic algorithms into maintaining replicas instead of traditional deterministic algorithms. Rumor mongering with counter and feedback shows best efficiency for initial distribution. And it chooses anti-entropy as a backup strategy for redistribution. It also introduces death certificates to assist the spreading working correctly. Discussion on spatial distribution provides some clues to design the algorithms work well with all topologies, in which the most important solution so far is building a hierarchical structure.

One flaw I found in this paper is that it proposes the combination of rumor mongering and anti-entropy. However, there seems to be no reason that we need the combination of these two. As discussed in this paper, rumor mongering works great in redistribution, even in the worst case. Although a peel-back anti-entropy solves the efficiency problem, there seems to be more work compared to use rumor mongering for redistribution.

The other one is the hunting. The paper mentions that this is a good solution to improve all connection-limited cases. It sounds like a great technique but the paper fails to provide more information on it.

The ideas provided in this paper sounds interesting by combining the distribution of information and the theory of epidemics. The algorithm described in this paper is simple and easy to implement. It should be practical even in today’s applications.

Posted by: Xiaoyang Gao | February 6, 2012 05:33 PM

In this paper, the author proposed to use several epidemic algorithms to maintain the consistency between replicated databases. To maintain the consistency between many replicas in a large scale system in a way that is both efficient in time and network traffic had been a hot topic for distributed systems.

Three methods: Direct Mail, Anti-entropy and Rumor mongering were proposed in the paper. Direct Mail can result in incomplete data due to the lack of global view and absolute reliability of network. Both Anti-entropy and Rumor mongering are epidemic processes. Anti-entropy is very reliable but not efficient since it exams the contents of databases. Rumor mongering is more efficient, but may result in incomplete updates. If Anti-entropy is used as a back up mechanism for Rumor mongering, we can guarantee the completeness of an update, and at the same time, we take advantage of the efficiency of Rumor mongering.

There were three methods to resolve the difference when propagate updates: “push”, “pull” and “push-pull”. “Pull” and “push-pull” can be much better if there is no connection limits. Connection limits is good for “push” since it limits the network traffic while it is bad for “pull” due to its dependence on the capacity of the infective nodes.

The deletion operation has to be processed differently to avoid old copies from overwriting the deletion operation. Death certificate is introduced for deletion which contains a timestamp and propagated as normal updates. The problem of the death certificate is that it can take too much storage if don’t delete them. The dormant death certificate is used so that most of the death certificates can be deleted after a specified time.

How to select the next nodes to propagate an update can affect the performance a lot. The realistic network is not uniform. So a non-uniform spatial distribution can reduce the network traffic without damaging the performance of the epidemic algorithm much. The author tested it on the actual CIN topology and the performance is consistent with the analysis.

The topological structure of the network has a big influence on the performance of the epidemic algorithms, especially when the non-uniform spatial distribution algorithm is used. As mentioned in the paper, for some topological structures,updates may not be able to reach all targets if non-uniform spatial distribution is used.

I think the idea of the paper is more like an exploration rather than a mature solution for the updates of distributed systems with many replicas. It’s very innovative to propagate the updates like diseases. But the algorithms are not suitable for all topologies, which means they may even need to change the topology of a system before they can use the algorithms.

Posted by: Xiaoming Shi | February 6, 2012 04:42 PM

This paper talks about various mechanisms of propagating updates to all the replicas (that are geographically distributed) of a database to insure consistency. The mechanisms are discussed while critically examining factors such as convergence time, traffic generated and storage required.
The primary problem that was solved by the paper was sending database updates to the replicas in a geographically distributed massive distributed system without causing intolerable delays and network traffic. These were the primary problems of previous systems which either relied on underlying mail delivery protocols (suffered from buffer overruns and inability to efficiently handle joining and leaving nodes) or used a "single master update" mechanism (single point of failure).
- The primary contribution of the paper is introducing randomized algorithms to the design to the design of distributed systems. While randomized algorithms are extremely easy to implement frequently turn out to be much more robust than deterministic algorithms (ex. lottery scheduling), analysis of their space and time complexities is rather unconventional and difficult (they involve determining the expected runtime and expected space complexity using probability principles).
- With respect to anti-entropy mechanism, the paper suggests a nice optimization to reduce the overhead of comparing entire databases during an anti-entropy exchange. The optimization involves comparison of checksums before comparing contents. One variation of this optimization uses a factor T (chosen to be more than the average update distribution time) to buffer updates in a list and exchanging lists first before comparing checksums. This insures that frequent checksum mismatches don't occur leading to exchange of entire databases. Another mechanism suggests building an inverted index of the databases by timestamp and exchange updates in reverse timestamp order. The second mechanism is used to prevent the frequent updates to T required as the network dynamics changes.
- The paper introduces a probabilistic version of rumor mongering for spreading updates. The paper quantifies the relation between the number of susceptible nodes in the system as a function of the parameter k of the algorithm through an exponential relation showing the sensitivity of convergence to k.
- The paper proposes variations of the basic algorithm and seems to suggest that a counter based deterministic version of the same algorithm seems to perform better than the randomized version.
- Another nice idea introduced by the paper is a deterministic algorithm to handle delete updates using death certificates. To prevent old updates creating re-incarnations of deleted items, death certificates are propagated. To prevent the storage overhead incurred from storing death certificates, dormant death certificates are maintained in a few sites and are propagated once old updates are discovered in the system.
There are no major flaws in the system. The paper has, in fact tried to provide modest expected runtime space and traffic complexities for the randomized algorithms. A few subtle problems:
- There were few parameters and design choices whose significance were difficult to understand in the expected complexity analysis (ex. in spatial distributions, the meaning of the parameter a, use of rectilinear meshes)
- The paper could have made attempts to propose a randomized algorithm for the death certificates (for example, the probability of a site deleting its death certificate decreases exponentially with the number of old updates it saw). Just giving a time threshold without any reason (30 days) was rather underwhelming.
- The paper could have incorporated the average size of the database into the expected convergence and traffic analysis. I'm not sure if the generalization used by the paper (just one item) is sound enough for the expected complexity analysis (I'm not able to formally prove that having 'n' items instead of 1 would change the complexity analysis but my gut tells me that there might be differences).
Overall, the paper proposes nice and simple randomized algorithms for update propagation in a distributed system. The simplicity of the algorithms makes them very attractive to be used in today's massively distributed systems where efficiency (in terms of time and network traffic) is very important. However, when I think of today's data centers supporting replication, I tend to think more of the environment present in the giant scale services paper (where replicas are not frequently geographically distributed and different site replicas are used only in case of disasters) so I'm not sure if the environment is very applicable today.

Posted by: Venkatesh Srinivasan | February 6, 2012 03:31 PM

The paper describes authors's attempt to optimize replicated database consistency in Clearinghouse servers on Xerox Corporate Internet. The approach used before the paper was direct mail - one site receives the update and sends it to all other sites. This caused bunch of problems, the biggest of which was high network traffic. Also, some of the sites didn't know locations of all the database replicas, and didn't send the update to them. One more problem was that direct mail was relying on the network infrastructure to reliably send the update to other sites.

The authors applied theory of epidemic to study alternatives of direct mail. Two competing metrics they were trying to optimize were network traffic and time it takes for all replicas to converge to consistent state.

First approach they discussed was simple anti-entropy algorithm. A site selects another at random and they synchronizes their databases. Their theoretical analysis suggested that convergence happens in time linear to size of the production, and their simulation proved that. They also discuss differences between push, pull and push-pull mechanisms. They formally proved that pull or push-pull is preferable to push mechanism.

Furthermore, they used SIR model to design novel rumor-mongering algorithm. When a site receives an update, it shares it with other sites as long it's still hot. If they detect an update is not hot anymore (other sites know it, too), the site stops spreading it. I really liked the theoretical discussion of the model, which gave the relationship between traffic and residue. By varying different mechanisms of the model (condition for update hotness, connection limit, push vs. pull) they are trying to optimize the three criteria, residue, traffic and delay. Unless residue is 1, some sites don't receive the update before other sites declare the update not-hot and stop sharing it. They are using the anti-entropy approach to deal with such cases.

Big problem with epidemic approaches is deletion of the data. If one site still has the data, it may get resurrected with anti-entropy process. To avoid this problem, authors use a nice trick called death certificates, which carry an information about deleted data. When an old data resurrects, the death certificates are activated from dormant copies stored on few servers, and sent to all other servers with instruction to delete the data.

One more parameter they discussed was the algorithm for choosing partners performed in both anti-entropy and rumor mongering. They got significant improvements in traffic amount when using topology-aware distribution instead of uniform distributions.

The biggest contribution of this paper is nice theoretical and practical discussion of both anti-entropy and rumor mongering algorithms and their variations for replicated database distribution. The algorithms are tested in the field, in, for the time, big corporate internet with hundreds of server and they got great improvements in traffic reduction.

One of the flaws would be that, as stated in the paper, spatial-aware algorithms fail seriously on some topologies. Other would be the randomized nature of the algorithms, which, in combination with distributed systems, makes it very hard to debug and replicate occurring bugs.

Posted by: Igor Canadi | February 6, 2012 03:20 PM

CS 739 Reviews - Spring 2012

Epidemic algorithms for replicated database maintenance

Comments

Post a comment