« Design and Architecture of the Microsoft Cluster Service -- A Practical Approach to High-Availability and Scalability | Main | Improving the Reliability of Commodity Operating Systems »

Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service

Y. Saito, B. Bershad and H. Levy. Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service. Proc. of the 17th ACM Symp. on Operating Systems Principles, Dec. 1999.

Reviews due Thursday 4/19.

Comments

SUMMARY
In "Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service" Saito et al. describe an approach to clustering where nodes are functionally homogeneous. Porcupine's design goals focus on manageability availability and performance. The system was evaluated in context of a distributed mail server, however the design principles are applicable to many applications.

PROBLEM
What are effective ways of organizing a cluster to deliver manageability, availability, performance and scalability? In particular, how should a mail service cluster be organized.

CONTRIBUTIONS
* defining clusters with functionally homogeneous nodes that provide an alternative to contemporary cluster organization schemes focusing on either replication or dedicating specific machines to needs of specific users
* well-justified design goals (e.g. addressing failures "with the same urgency with which one replaces light bulbs" is a very effective analogy)
* use of self-configuring and self-healing where it is appropriate (i.e. not going overboard with it)
* design of the system architecture of a mail service cluster, with some features applicable for other applications (e.g. membership handling, hard vs soft state, etc)
* system evaluation
* the system seems to be well-suited for "hierarchial clustering" (i.e. since all nodes are functionally homogeneous with functionality very similar to that of the cluster as a whole, it seems like it should be relatively easy to construct a cluster of clusters where every (or some) of the nodes are clusters)


FLAWS
* porcupine?
* not 100% suited for all applications. E.g. if you are amazon.com lax consistency guarantees may mean that you advertise sale price after the sale is over. there could be work-around though
* not scalable in "software system complexity" and hence impractical. I am no expert, but here's how I imagine a minimal mail system that requires a cluster to just handle the mail load: MTA, MUA, spam filter, antivirus, backup (redundant storage and backup are very different things), quota enforcement, logging service and some implementation of cron. All of a sudden having functionally homogeneous nodes doesn't sound nearly as nice because now every node configuration is complex. Throw in Porcupine limitations like lax consistency guarantees and things start to look even worse for services like logging. One solution to the problem would be to have a separate Porcupine cluster for every service, but once again, given Porcupine's semantics and quirks (e.g. soft state reconstruction) for all those pieces to inter-operate well they have to be produced by the same vendor, which can be very limiting.


RELIABILITY
Porcupine strives for high availability. That is achieved in part by replication, low MTTR (in self-configuring and self-healing cases) and reducing potential for operator error by making the system easy to manage.

Summary:
This paper describes a cluster-based mail service called Porcupine. Porcupine provides high scalability and high availability with automated easy management and configuration.

Problem Addressed:
Existing clustered service or mail service had a problem either with scalability or manageability and still did not have enough availability. Therefore, a mail service which is easy to manage and completely gains the benefit of clustering was desired.

Contributions:
By using layer of indirection such as DNS and automated management function, system does not require user any modification and achieve availability and provides manageability to system administrators.
Designing a “functionally homogeneous” system which still has high throughput is not easy and the authors enable this by soft state and also some how loose requirements of mail service such as in-order delivery.
Dealing with node failure and addition without whole system down is another benefit from homogeneous design and this is also a important contribution.

Possible Improvements:
Because Porcupine is using replication with homogeneous design, the number of control message transmitted through the whole network will increase when the scale increases. Replication is important because it provides high availability, but it might be interesting if they could compare with other architecture of system.
Maybe some mechanism such as Distributed Hash Table might help dynamic load balancing and adding and removing nodes without a disaster.
Also automating the system configuration and management provides manageability, but on the other hand, when some unexpected failure happens, administrator will be required to understand the whole system’s mechanism or they should provide a well designed user manual.
But I think Porcupine is very well designed and achieving the three requirements that they proposed at the beginning of the paper.

Reliability:
This paper focuses more on availability since it is only for mail-service. The character of e-mail allows system to manage synchronization not so strict policy but requires consistency for each user’s data. So this service might not match to the requirement to other type of service such as file server, but as mail service it is clear that they have achieved all manageability, availability, and performance in limited but big enough scale.

Summary
This paper describes Porcupone, a "scalable mail server" which uses a large cluster of commodity PCs in order to achieve scalability and availability.

Problem
The rapid growth of the internet has led to the problem that services can no longer be hosted on one big machine and scale at the level that is neccessary to keep up with growing demand.

Contributions
* Definition of scalability as conisting of requirements of manageability, availability, and performance.
* An architecture which meets their requirements for scalability as defined by them.
* Recognition of neccessary over-capacity versus hardware cost problem and the advantages of using commodity machines in managing it.
* Architecture that allows a any node to perform any function.
* Distinction between hard and soft state.

Flaws
The ability of any machine to perform any function seems like it might result in uneccesary overhead and wasteful network traffic.

Reliability
The main reliability focus of Porcupine is availability than lack of any experienced faults.

Summary:
The paper describes a design and performance analysis of cluster-based, highly scalable mail service named Porcupine.

Problem:
Most of the clustering applications at the time took one of the two approaches - replicate data across a large number of nodes (good performance but limited mostly to read-only access) or split data into a number of nodes, so each nodes service only some requests(mangeability, scalability and avilability problems). Porcupine attempted to create service that automatically adjusts itself to changes in node configuration and load, provides data replication and is easy to manage

Contribution:
* Use of proxies to create a level of indirection and allow load-balancing
* Use of techniques from Networking domain(DNS) to provide transparency to users while providing load-balancing
* Transparent replication of data, without requiring special storage devices
* Lots of experiments to show the performance of the system (or justify the claims)

Flaws:
- Gathering information about load on each node seems to be a little less thought of, for a mail system designed to handle billions of messages a day.
- Use hierarchical messaging goes a long way in ensuring scalability for phases where nodes are added/removed. There are many similar protocols in networking domain to do this. Current design of Porcupine uses N*N messages for N nodes.
- As average size of messages grow, the effect of replication will become more taxing on network and probably the nodes themselves (to receive and store dat). Probably an internal network for nodes (instead of sharing with external communication) and/or shared storage devices could improve this.
- For a system designed for millions of users, user map of 256 entries for tests was probably not large enough

Relevance:
I think Porcupine did a good job of creating an easily manageable, easily scaleable, cluster-based mail service, though there were a few things it could improve. The paper was quite well written too. I wonder why it didnt make it to a commercial product.


Summary
THis paper describes the architecture and protocols that are used in Porcupine, a distributed email server that is able to handle half a billion emails per day when using 30 nodes, while remaining easy to manage.

Problem
Existing mail solutions required too much or too hard management (e.g. moving users around as the load changed) and didn't scale to the point that the authors wanted. They also didn't have the failure characteristics that Porcupine presents, as, for instance, the failure of the node your were assigned to meant that you couldn't get mail.

Contributions
* A pretty scalable system. The authors talk about some bottlenecks in the protocols that might let it grow even more, but already it's at 500 million emails/day; even assuming that each user gets 500 a day (which is I'm sure too high by a couple factors), it's capable of serving a million people. That's a pretty sizable base. (Now, there are timing constraints with peak loads and such that may make that number less.)

* Even with replication, which cuts their throughput in half, they are still as fast as sendmail for the same number of nodes, and Porcupine does it without statically partitioning the load (giving the admin more work, leaving it open for failure, and making it more susceptible to weird loads)

* I get the sense that they are proud of achieving this for a write-intensive workload, and a couple things they mentioned in related work wouldn't scale as well under those loads.


Flaws
A couple of their configuration things seemed suspicious, like tuning the hash function exactly to their workload (top right p.9); it's not clear that an optimization like this would always be possible. They also have user authentication checks off.

It's somewhat application specific; like this probably doesn't give you a lot of groundwork for a distributed database. (Though to be honest I'm not completely sure about this; you might be able to adapt it somehow.) This isn't really a big flaw though, and actually it is actually surprisingly flexible, being adaptable for even a web server.

Reliability
They are really more focused on availability, which they get by making nodes unspecialized, allowing any node to service any request, and their particular layout algorithms and protocols.

Summary
The Porcupine cluster architecture allows for a mail server with load balancing and good failure properties from heterogeneous commodity PCs.

Problem
Replication clustering cannot handle E-mail's write demands and traditional E-mail static partitioning causes tedious manual balancing and requires more resources than needed.

Contributions
The article was very careful in choosing its consistency goals. By only aiming for a system that converged to a consistent state the resulting cluster was fairly simple with low overheads. Temporary inconsistencies are usually tolerable in an E-mail system and seem well worth the performance boost.

The cluster components kept hard state for all critical information such as user information, update logs, and mailbox fragments, while much of the information was kept in soft state. In the case of failure soft state is reconstructed from the hard state on clusters. In this way only the hard state is required to be consistent. Soft state on each node will eventually become consistent as it gains information from stored hard state. This helps to keep the consistency mechanisms simple.

Mail was fragmented over a small number of computers. By sticking to a small number of computers disk seek time is kept low and less directory work is needed to locate all the fragments. Each user's mailbox was spread over 1 to 4 machines. An incoming request contacted any cluster entity, which looked up the location of the desired mailbox fragment based on the load at each of the machines in the spread. Redundancy of mailbox fragments gives even more choices for the cluster to contact. As failures occur users can still access their mail, but with poorer performance due to fewer choices to select the least loaded machine from. This give the system graceful degradation as failures occur.

Possible Improvements
The system did an exceptional job achieving the goals laid out. 1Gbps networks are still top of the line today, if the system is to scale to larger numbers effort will need to be made in order to more efficiently use network resources. The lenient consistency guarantees limit the cluster to use with certain applications. Databases or financial applications would not be able to tolerate getting the wrong answer every so often.

I also wonder what effect medium and large sized attachments would have on the system. The system load is based on pending requests. A large attachment might take some time to process, but would only count as one request. Resulting in a somewhat inaccurate view of load. Transfer of the attachments might also put strain on the network if the system is already under heavy load.

Reliability
The failure avoided is complete downtime for a given user's mailbox. By employing replication and dynamic load balancing even after a failure all mailboxes are still available.

Summary:
This work assesses the requirements of a distributed processing system and develops an architecture to support an email system with millions of users. System performance is analyzed using a cluster of 30 heterogeneous computers.

Problems Addressed:
At the time this paper was written the Internet was just beginning to take hold and networks were beginning to expand drastically. Systems that supported many users had to be upgraded frequently to keep up with the expanding demand. Clustering began to be popular to deal with the scaling problem however the systems either would replicate data to a number of servers thus only allowing read access or the systems would divide the user base up among the individual machines in the cluster which makes management very hard and can lead to some users having poorer performance then others. The porcupine goals for the system included easy management, high availability, and high consistent performance.

Contributions:
To accomplish the management and performance goals of the work the authors developed an architecture where any node can perform the same operations as any other node. Thus when a user wishes to connects to the cluster they can connect to any computer within the cluster. The computer that a client connects to then acts as a proxy if there is no state for the user stored on that node or it can serve the client directly if state does exist locally. Cluster changes such as node addition or removal can be handled automatically along with dealing with a failed node. High availability is accomplished through the use of replication where multiple nodes within the cluster contain the same user data. This makes it possible in the case of a failing node to still retrieve user data that would have been served by the failing node.

Flaw:
The amount of auxiliary traffic associated with maintaining a system where any node can perform the same operations seems like it could become quite a lot especially as the system scales to a large number of machines. Also even-though the system strived to be free of all tuning parameters as the system scales a user spread parameter must be evaluated to provide the highest performance.

Reliability:
Porcupine provides high availability through the use of data replication and the ubiquitousness of all machines within the cluster. The ability to scale the system with increasing demand also impacts reliability since all machines remain more balanced in terms of load and are less likely to fail due to an overload.

Post a comment