CS 739 Reviews - Spring 2012: Orleans: Cloud Computing for Everyone

« MapReduce: Simplified Data Processing on Large Clusters | Main | Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling »

Orleans: Cloud Computing for Everyone

Orleans: Cloud Computing for Everyone, Sergey Bykov, Alan Geller, Gabriel Kliot, James Larus, Ravi Pandya, and Jorgen Thelin. In: ACM Symposium on Cloud Computing (SOCC 2011)

Review due Tuesday, 4/17

Posted by Michael Swift on April 16, 2012 11:31 AM | Permalink

Comments

Orleans is another system like MapReduce to make it easier for a programmer unfamiliar with distributed systems or unwilling to create their own management software to write code for that environment. It's chief aim is incredible scalability, both up and down across several orders of magnitude, without the direct intervention of the developer. Their model requires code to be broken down into individual objects called grains, these are the atomic units of scaling and replication.

To make reasoning about grains easier, each is required to only run sequential code. I like the idea of grains and of breaking computation into an easy format for both developers and Orleans but am unable to quite picture how this affects the development process. Can grains be used to implement multi-threading in the same manner? From my understanding of the paper, they can be used for all the same purposes with the addition of the monitoring by the system to manage conflicts.

Grains are kept isolated from other grains while active unless they make specific calls to another. This ties the two grains together and in general, grains are tied to only one transaction (one client in a client-server model). This prevents much complication by keeping the areas of contact between grains from spreading much beyond specific problem areas. Grains being cheap to create makes this an effective method. Due to expecting the creation of massive amounts of grains, the designers of Orleans included a garbage collector. This is a necessity. It is also a significant point of optimization (as the authors note). Isolation also keeps errors from spreading beyond a single transaction. Data isolation is preserved in the face of errors by the use of a two phase commit process.

Another major feature of the system is its merge function which reconciles conflicts that arise between different activations of a single grain archetype. The consistency model is loose enough that two grains of the same archetype may present different views of the system depending mainly on their activation time and subsequent actions. Reconciliation work is diminished by having the grains wait to commit their state changes until the point of deactivation. Developers are allowed to create their own protocols to handle merges.

One element that remains unclear is how this model would implement some manner of data store or, in general, how data is treated. The discussion revolves around the grains and the state of the grains. Is state the atomic unit of data here and to work with it, one need only cast data chunks as appropriate grains? If one wanted to run a key-value store over Orleans with appropriate data replication, it is not clear to me how that would be done in the grain model. The Twitter-like example ought to answer these questions but unfortunately, doesn't explicitly speak to the data side.

Posted by: Brian Nixon | April 17, 2012 08:45 AM

Seth Pollen, Victor Bittorf, Igor Canadi, David Capel

This paper describes a distributed runtime environment called Orleans. Orleans presents a simple object-oriented programming model and cleanly distributes its computations for scaling and fault tolerance. Orleans is similar to many existing distributed object frameworks such as Enterprise JavaBeans, but it introduces a new transaction model and transparent scaling.

This system addresses the problem of bundling the machinery of scalable, available distributed computation into a library that can be invoked by a variety of programs. In this sense, it is similar to MapReduce, but Orleans differs significantly from MapReduce in its design target and computation model. MapReduce optimizes the throughput of large, predictable, offline computations by using a stringent functional programming model, while Orleans supports a flexible object-oriented style and targets interactive online applications.

The paper emphasizes the clean programming model for Orleans objects (called “grains”). A grain is simply a .NET class with a few restrictions: its method return types must be asynchronous, and its fields and method parameters must consist of serializable data and references to other grains. Grains are automatically persisted using a ready-made distributed store such as Microsoft Azure Store. Grains can call methods on other grains (implicitly invoking RPC) and can even pass references to grains across RPC. Exceptions, too, are transparently propagated through the RPC mechanism, even across chained asynchronous calls. Thus, it appears that a persistent online application can be written in Orleans simply as a collection of objects that store data in fields and call each other’s methods. That being said, it is still clear to programmers (due to the use of asynchronous calls) where distributed communication occurs in the application; this helps programmers to make informed performance tradeoffs.

While the programming abstractions of Orleans do look attractive, we consider it a flaw that the system is implemented on top of the closed-source, non-Unix-supported .NET platform. We have, however, come to expect such from Microsoft work.

Another major contribution is Orleans’s consistency model. Orleans supports atomic transactions with the traditional concept of commit and rollback/abort. It guarantees strong consistency for all reads and writes in the same transaction by ensuring that each transaction uses at most one activation of each grain. Since each activation has only one thread, and communication between activations is restricted to message passing and synchronized reconciliation, this guarantees that reads and writes by a particular transaction to a particular activation’s data are temporally serialized. It appears that providing this guarantee can, in many cases, force a transaction to be aborted and retried on a different set of activations.

Thought it supports strongly consistent transactions, Orleans does not provide strong consistency in general. Updates committed by one transaction are eventually propagated to all the activations in the system. There is no guarantee when other transactions will see these updates, though the multi-master branch-and-merge protocol invoked by the paper seems to guarantee that updates will propagate promptly as long as no activation is overloaded. They note that a mechanism can be introduced to remove busy activations from the work pool to give them time to reconcile their data with the updated backing store.

Unfortunately, this paper lacks a good evaluation section. The MapReduce paper reported on extensive experience from internal Google operations, but this paper only mentions a few demonstration programs designed specifically to test Orleans. Also, the Chirper evaluation was performed with only 1,000 grains. The authors claim that this test was easier because loading the system with millions of grains took too long, but we think a real-world measurement of large-scale performance is important.

The paper also neglects to give details on reconciliation of write conflicts, which is usually a key problem in transparently distributed systems. They claim that most conflicts can be properly resolved either with a last-writer-wins policy or by the use of the automatically resolving record, dictionary, and list types provided by Orleans. The paper would be greatly improved by more details on these mechanisms as well as the API alluded to for implementing custom conflict resolution.

The paper’s future work section indicates that intelligent, automated resource allocation is an outstanding area of research in Orleans. We agree that proper allocation and load balancing is a difficult problem in systems such as this.

We feel that systems like this may have broad application in the implementation of the online request/response services that dominate the Internet today. The paper is fairly recent, however, and lacks a convincing evaluation, so it is understandable that Orleans has found few real-world applications as yet. However, the technologies which it may claim to replace (such as Enterprise JavaBeans) have achieved wide acceptance and may have paved the way for distributed frameworks such as Orleans.

Posted by: Seth Pollen | April 17, 2012 07:01 AM

Orleans is a distributed application framework developed by Microsoft. This framework was designed for the cloud for performing calculations with web services being its target application type. This framework consists of "grains" which are instances of an application that stand alone to process requests. Grains can communicate with each other but do so over an asynchronous interface (in a very similar manner to other asynchronous network frameworks). Each of these grains has a set of activations. Activations can be somewhat thought of as concurrently running versions of the grain which share the grain's state but not values between the activations. Orleans also keeps tracks of user requests (as transactions) these transactions are then executed on activations of a grain. If the grain or one of the activations fails. The transaction is aborted and re-attempted by a new grain.

This framework has some similarities to Map Reduce in how it divides work and its general computation model. However where Orleans differs from Map Reduce is the use case that it is designed for. Orleans seems to be designed to support a more real time request model where request's are handled by each of the grains via activations of a grain. In contrast Map Reduce launches a Master/Worker node setup for each new run that has to be performed. Orleans style of distribution may be more ideal then using Map Reduce for certain cloud compute tasks.

Overall I thought this paper was good but not great. Its another entry into the crowded field of distribution frameworks. While this paper did not show very many new ideas. It did put together existing idea's in application distribution in a way that produced an interesting framework. Whether or not this framework see's use outside of Microsoft is something that remains to be seen.

Posted by: Benjamin Welton | April 17, 2012 01:22 AM

Orleans is another distributed programming model that aims to abstract the
details of error handling and failure recovery away from the programmer. The
goal of doing this is to make it simple for a "non-expert" in distributed
systems to be able to program software that can scale, with the framework adding
more nodes as it needs to.

The problem that this project tries to solve is decreasing programmer time and
expertise for distributed systems. This goal is a worthy one, as distributed
systems and computations are extremely challenging to program. This can be seen
easily by the amount of research papers published about the progress of such a
progress. Also, many tasks are much faster or only feasible when distributed
over many systems. Microsoft contributed a distributed programming model along
with an implementation of such a model. This model constrains the programmer to
an optimistic consistency model, provides ways to manage most characteristics
considered in a distributed system (e.g. performance, persistence, etc.), and
gives good availability considering the system Orleans runs on top of, and is
expressive enough to implement (so far) a few toy distributed applications.

Orleans definitely succeeds in shrinking the code of certain distributed
applications. The flexibility allowed in the size of the "grain" contributes to
this. One facet of this runtime that I had a problem with is the atomicity
guarantee. The way the example was presented in the paper regarding atomicity,
it appeared that an unfortunate ordering of events would leave a bunch of events
without committed data and a need to retry the same operation. The paper did
not say how to avoid this. This situation existed because of the need to commit
changes after they are complete to enforce this atomicity (and consistency). I
question whether this abandoning of eventual consistency is actually massively
scalable, since the semantics sound strikingly similar to Paxos. My suspicion
became greater when the benchmark results were not terribly impressive. The
cluster used was only 50 nodes, the linear algebra library did not show good
scaling properties, and their reimplementation of Twitter showed good speedup,
but only for the average amount of subscribers and heavy subscribers. The
implementation details waved hands over the case of a popular Tweeter, which, I
would argue is a common workload for Twitter. Their solution of a hierarchical
dataflow graph made up of grains with partitioned followers did not sound
pretty, but I do not know how it is normally implemented. I just found the fact
that they didn't provide results for this case an oversight.

Choosing a programming model for a distributed application is one of the most
important tasks in the design phase, as many frameworks (such as Orleans or
MapReduce) can really constrain the possibilities in exchange for a simpler
programming task. One must consider not only how their current application can
scale in an elastic manner, but ways in which the application might expand or
possibly outgrow a constrained model such as Orleans.

Posted by: Evan Samanas | April 16, 2012 11:48 PM

In this paper, the author presents us the Orleans, a software framework for building reliable, scalable and elastic cloud applications in a easy way. It’s based on the Actor Model in which everything is an actor and actors work concurrently. An actor can communicate with others by sending messages, and it can create new actors.

One point that Orleans differentiates from traditional Actor Model is that it automatically creates replicated instances called activations if a grain is too busy. And the messages passed to the grain will be distributed independently by the framework. All this is transparent to the developers and handled by Orleans. The framework provides an elastic scalability for some type of system bottleneck. Other bottlenecks need the involvement of the developers. For example, if an instance has to send message to too many grains( beyond the capacity of a single machine ), new activations generated by the framework can not help. In this case, the developers have to partition the receiver of the message across the activations.

Another mechanism makes the system more scalable is that the framework allows concurrent updates to the same data of a grain from different activations. The operations are transactional, and the data is transient in memory before the transaction commits. Writing data to persistent storage may cause conflicts, and both default and customized reconciliation are supported by the Orleans. In this way, the system batches the data updates in transactions and defers the conflict reconciliation. As a result, it has less overhead for keeping data consistency. Other advantages of using transactions are that it isolates concurrent operations from each other and provides an natural way for error handling and recovery.

The consistency within a transaction is different from that across transactions. It uses single thread program for each activation, and it guarantees strong consistency if the transaction sees only one activation of a grain. Between transactions, it only ensures the visible grain state reflects previous operations. The consistency provided by this model is better than eventual consistency but worse than strong consistency.

The implementation of the system makes advantages of the properties C# and .NET. Asynchronized communications are used in the system. Unlike RPC, a function call will not block and a “promise” is returned instead. Function delegations will be called to deal with the completion or errors of the communication.

The author evaluated three type of applications’ performance on the Orleans system. The first is a Twitter like social media application which is communication intensive. In the experiment, they make some grain overloaded intentionally and the system can maintain the throughput by automatically creating multiple activations on a single server. The second application is both data intensive and computation intensive. The Orleans running on multiple servers achieved sublinear speedup, and the overhead is caused by the increased data transmission and coordination.

Posted by: Xiaoming Shi | April 16, 2012 07:11 PM

CS 739 Reviews - Spring 2012

Orleans: Cloud Computing for Everyone

Comments

Post a comment