Locality-Aware Request Distribution
Locality-Aware Request Distribution, Vivek Pai, Guarav Banga, ASPLOS-VIII
Review this or consistent hashing for Tuesday, January 31.
« Experience with Grapevine: The growth of a distributed system | Main | Web caching with consistent hashing »
Locality-Aware Request Distribution, Vivek Pai, Guarav Banga, ASPLOS-VIII
Review this or consistent hashing for Tuesday, January 31.
Comments
Typical services have front-end and back-end. Front-end will distribute request to one or several back-end nodes. There are different ways of storage and request distribution. Partitioning the whole data set to different nodes will cause requests to be fixed to certain nodes, which will cause unbalance of processing on all back-end nodes, then some jobs will be especially delayed. If all back-end nodes have access to the whole data set, the way to distribute request is important to load balance. Round robin is not the best in case the cache size on each node is smaller than the whole data set size, because round robin makes every data to be possible to be accessed, so the potential work set is the whole data set on each node; and with low cache hit rate, performance degrades.
This paper proposed LARD, which tries to achieve load balance and high cache rate. The main idea is that the back-end nodes have access to the whole data set, the front-end arrange the first request to the one with lowest load and the following duplicated requests to the previous assigned node. By choosing the lowest-load node, LARD achieves load balance; by direct the same requests to the same node, LARD achieves high hit rate. There are also some other improvements: if the assigned node is too busy, duplicate request will be assigned to a new less-loaded node; assign several nodes to one request and choose the least loaded one each time.
Another contribution is the TCP hand-off protocol, after the request is assigned the back-end node, the TCP connection is redirected to the back-end node. The reduce the burden on front ends, so that improves the scalability. The request gets to front end, front send notification to back end, back end send data directly to client, client send ACK to front end, front end forward to back end.
Discussion: The hand-off TCP is not quite clear. It seems the message from client to server will always pass the front-end and then send to back end, only the reply to client is sent from back end to client directly without passing front end. It only scales the scenario in which the outgoing link from front end to client is bottleneck.
Posted by: Wenfei Wu | January 31, 2012 07:39 AM
Summary:
Content-based request distribution is promising in cluster-based network servers since it has several desirable advantages such as increased performance due to improved cache hit rates on back-end nodes. This paper proposes locality-aware request distribution (LARD), a specific strategy of content-based request distribution which could achieve high cache hit rates and load balancing on back-ends at the same time.
Problem:
A cluster-based network server serves users by distributing their requests to one of the back-end node and processing the request. To avoid performance degradation caused by overloaded back-end, front-end usually dispatches these requests in a round-robin (e.g. WRR) manner to achieve load balancing. In this case, each back-end is very likely to receive requests for every possible target in working set. The problem is, when the size of the working set exceeds the size of the main memory cache at individual back-end node, frequent cache misses will occur which severely hurts the performance.
Contributions:
The novelty of LARD lies in its combination of cache locality and load balancing. On one hand, it’s not hard to achieve load balancing by dispersing users’ request in a round-robin fashion. The problem is it may cause frequent cache misses when individual back-end cannot cache the whole working set. On the other hand, cache locality can be easily achieved by partitioning the working set and forwarding request for targets in a particular partition to a particular back-end. But it’s likely to overload certain back-end if certain targets on that back-end account for a large number of requests.
The two features are achieved simultaneously by maintaining a dynamic set of nodes for a target and assigning requests for that target to the least loaded node in the corresponding adaptive set.
Another challenge is frond-end needs to set up a connection with the client, inspect the content of the request and then assign the connection to the appropriate back-end. In order to avoid front-end becoming the bottleneck, a TCP handoff protocol is carefully designed and an efficient forwarding module is added for fast forwarding the ack packets from the clients to back-ends. The advantage of this method is the handoff protocol is transparent to both client applications and server applications on back-ends.
The authors conducted extensive simulation to compare LARD with other request distribution strategies such as WRR and locality-based, and study the sensitivity to factors such as CPU and disk speed. Prototype experiments are performed to demonstrate the performance improved.
Flaws:
- The front is a single-point of failure. It would be nice to see a replication scheme for the front-end to overcome this drawback.
- The front is very likely to be the performance bottleneck. The front-end requires some application level knowledge and may need to maintain the mapping between each possible kind of request (e.g. HTTP, FTP, SQL) and its corresponding back-end server set. Furthermore, every connection between the client and server is managed by the front-end in a hand-off module process. With a large number of different possible requests, the amount of additional overhead caused by maintaining the hash table and connection hand-offs may be significant.
- Performance is actually workload-dependent. The paper works better under the assumption that the size of working set is larger than the individual back-end cache. It would be interesting to see what happens when this does not hold, and if a mix of different working sets coexists.
- It would also be interesting to see how LARD could be applied to dynamic generated content such as dynamic web pages.
Applicability:
This paper brought up two important issues – cache locality and server load balancing. They are important not only in cluster-based servers but also in the CDN where servers are geographically distributed on the Internet. I guess this paper gives some guideline for the question like where the web cache should be placed and how to balance workload well for servers in this context.
Posted by: Yizheng Chen | January 31, 2012 06:40 AM
This paper presents a scheme for improving the performance of a cluster-based
web server by using a front-end node to distribute incoming requests among the
available back-end nodes based on what specific content is being requested.
This improves performance by increasing the in-memory cache hit-rate on the
backend servers, since it effectively allows the caches of all the servers to
act in concert as a single large cache, reducing content being cached
redundantly on multiple back-end servers.
While the discussion and experiments in the paper are entirely couched in
terms of web traffic, the basic concepts it introduces could be generalized to
improve performance in cluster-based servers for other protocols oriented
toward the delivery of specific content on request (e.g. FTP). This is
because the fundamental underlying improvement is in the utilization of the
collective cache of a cluster of servers, which is obviously not an
HTTP-specific concept.
The authors evaluate the LARD (and replicated LARD/R) systems both via
simulation and on actual hardware, showing impressive performance improvements
over the commonly-used weighted round-robin (WRR) scheme. In addition to pure
performance, the LARD arrangement also offers much-improved scalability with
projected improvements in CPU performance, since back-end nodes are primarily
CPU-bound, as opposed to the disk-bound performance of WRR.
The primary disadvantage of LARD, though it does not seem hugely prohibitive
to me, is significantly increased complexity, primarily in the mechanism used
to hand off active TCP connections from the front-end node to whichever
back-end node is selected to serve a given request. The authors acknowledge
this complexity ("beyond scope") and do not go into great detail on its
operation. Nevertheless, it is clearly feasible to implement, as the authors
did so in the context of a real-world OS kernel in order to run their
experiments, and achieved high performance in doing so. While this is
certainly quite acceptable for a research prototype, the increased risk of
security vulnerabilities might be a non-negligible concern if it were to be
deployed in a real-world, public-facing web server.
Overall however, the paper presents a convincing and appealing case for the
LARD approach -- yet, to the best of my (layman's) knowledge, LARD as
presented has not been put into widespread use, with simple WRR-like schemes
remaining popular. This may be because the quantity of memory available in
web servers has grown to the point where it can contain the entire working set
for most loads, leaving them CPU-bound and with effective cache utilization
even with a naive request-distribution algorithm. Another possibility is
difficulty in adapting LARD to HTTP/1.1 (recently introduced at the time this
paper was written), which allows multiple requests per TCP connection and has
become quite ubiquitous in web traffic. If a single back-end server were
selected to serve all requests for a given connection based on the content of
only the first request, it seems likely that much of the benefit of LARD would
be lost. Or perhaps the connection handoff could be reversed to return
control of the server side of the connection to the front-end after a back-end
node completes service of a single request; the front-end could then
re-handoff the connection to the appropriate node for the next request (though
this would of course further complicate an already-complicated mechanism).
Posted by: Zev Weiss | January 31, 2012 04:45 AM
In cluster-based network server design, the policy to distribute the request to backend nodes is critical. One of the previous policy, Weight Round Robin gives a good load balance to the server side, but it did not use the advantage of locality cache hit. Another policy -- content-based type could direct the particular request to particular node which could imporve the main memory cache hit rate but the disadvantage is poor load balance performance.
In this paper, the author proposed LARD policy which abosorbs the advantages from the previous two polices. LARD not only utilized the content-based request distribution to improve cache hit rate but also set up a mechanism to use T_high and T_low two factors to ensure there is no overloaded node or idle node to achive load balance.
Another topic the paper mentioned is the TCP connection handoff which could help the server at the backend side directly reply to the client. This is a very nice design which could improve the information forwarding rate between the backend and the client with regardless to the possible bottleneck in frontend. I appreciate the concept of the TCP handoff policy a lot, it provides a good way to speed up the data transport in today's data center.
In order to prove the efficiency of LARD policy and TCP handoff protocol, the author designed a trace-based simulator and implement a prototype. The result is convinced and impressed.
This paper is quite understandable and I like it. My question are:
In this paper, the front-end seems just ONE node such that we could consider the whole cluster center as master-slave alike architecture and there would be a vital problem if the front-end node crashes. A backup node or ring architecture may improve the stability of the cluster.
Posted by: Jie Chen | January 31, 2012 04:31 AM
This paper presents a method to provide distribution of data using locality information and spreading it across back-end nodes which are transparent to a user contacting the front end node. A more even distribution of data can be created where similar data is more likely to be requested multiple times at the same location, increasing the usefulness of caching.
The goal of this is to provide a system with overall better performance and fewer cache misses compared to existing round robin based caching systems. This requires a front server which can decide on which back-end node should be responsible for a certain task. A certain file could be served either by a single back-end node or possibly more than one depending on the configuration. The important point is to make sure that a back-end node does not become overloaded. The front-end tracks how many active connections each back-end node currently has and tries to distribute new tasks accordingly. This setup allows a client to remain completely unaware of the exact setup of this system and only has to begin by connecting to the front-end server.
One contribution of this paper is the Locality Aware request distribution (LARD) which provides methods for providing locations of cached data. This provides improvements over round-robin distribution. The front-end node monitors the number of connections at each back-end node and attempts to keep locality of data in mind along with the number of connections when assigning where cached copies are stored as well as where a connection should be made for a client. Another advantage is the ability to scale better to demands for an increased working set size so there is additional flexibility built into the system. The TCP handoff protocol presented in the paper is another area of improvement. This handoff protocol allows a client to contact the front-end node and then receive data from a back-end node without having to be aware of any change in the connection.
One potential problem is how the design is still heavily dependent on the front-end node. They do reduce the workload for the server by sending tasks to the back-end node but do not provide any redundancy for problems encountered at the server itself. Monitoring the number of active connections seems as though it could cause problems when trying to scale-up the size of the system as a whole. Each addition will add more back-end nodes that the server has to monitor, adding additional work for the front-end server node.
The transparent forwarding from the TCP handoff protocol that occurs from the front-end server to the back-end nodes could be applied to various TCP based problems similar to this that require transmitting large amounts of data. Using locality awareness to improve the effectiveness of the cache is similarly useful for many data caching applications where there is a range of choices for where data should be stored.
Posted by: Daniel Crowell | January 31, 2012 01:18 AM
It is common practice to set a front-end server to face the client side so incoming requests can be routed to the back-end servers in a way that balances the load. The authors of this paper have designed a system, LARD, to load balance that aims for producing more cache hits and avoiding the time sink of disk searches by being more content aware.
Their implementation is very clean. A simple hash map in the front-end server associates requests with particular server. When the load at a server gets too high, the map chooses a new server to associate with that request. In this way the collective cache space use scales with the number of servers instead of remaining effectively constant as with the naive method.
The analysis compares their protocol against two alternatives. The first was the then current state of the art routing protocol that used round robin assignment. The second protocol assumes a naive pre-partition of requests that doesn't adjust to load. Simulations were run using trace data from two different systems. When comparing throughput, LARD performs significantly better than round robin. They made sure to also run tests that don't favor their protocol where they discovered LARD doesn't perform worse. When comparing server utilization, LARD performs much better than naive partition. The dynamic load balancing results in highly efficient use of the back end. I believe the authors' tests were well designed and explore all the relevant cases. Their protocol shows clear performance improvements and its simplicity make it easily implementable. They focus on future development and consider effects of changing hardware costs.
LARD is a clear precursor to modern concepts of content-centric routing only instead of trusting the network to provide the service it is a private system implemented on a set of servers. Thus they avoid the issue of advertising the content held at each server and the security concerns raised. I find this to be a well designed system that continues to be interesting as it gives early forms of modern ideas.
One further technical challenge that had to be confronted was a way to avoid routing all communication through the front end. A TCP hand-off protocol lets the back-end server send its content directly to the client without the client having to set up a new connection. It resembles a very early form of the protocols currently used in cellular networks.
Early in the paper, the authors alluded to an issue I wish they explored more. One of the basic assumptions is that all servers have access to all the data on disc. With huge data sets this is not realistic. A model that partitions the data set between the servers with some redundancy and that moves content dynamically to respond to load changes. This massively increases the scope of their project but responds to a valid question. Certainly if we consider the case of dynamic content addition, replication will be a factor. They leave it an open question to perhaps be solved with the addition of another system.
On an unrelated note, I find the use of transparent to mean invisible irritating.
Posted by: Brian Nixon | January 30, 2012 11:47 PM
The paper gives a good overview of request handling in a cluster-based client-server environment. A request is made to a front end server which redirects the request to the appropriate backend server using some distribution mechanism like round-robin, locality based, etc. Then once the request is redirected, the backend sends info to the client directly. Also, one of the main contributions of this paper is the introduction of TCP connection handoff, which sets up a connection from backend server to client instead of going through the front-end server in an efficient manner.
The main goals behind locality-aware request distribution(LARD) is to combine the following advantages:
1. Redirecting a type of request to a specific location. This way, the amount of memory used for caching is minimal due to less duplication. This also reduces the occurence of cache misses to a great extent provided the cache is big enough. Methods like WRR(weighted round robin) suffer from these problems.
2. The locality-based methods on the other hand can suffer from load imbalances if one type of request is very popular and hence is queried a lot more than others. This will put a lot more load on some machines when compared to the others. LARD tackles these kinds of problems in an efficient manner by reshuffling the server for a request depending on existence of higher load than expected.
Even with LARD, there are certain types of problems that can still arise in case the cache server is down or in case the load on the servers altogether is much more than the expected load. LARD handles abnormal load increases by setting an upper limit on the load that can be handled, which would result in waiting requests. There is another type of LARD that handles fault tolerance cases of peak load or downtime in one of the cache servers by replication. Overall, I feel that the LARD method is a brilliant way to create a method that can handle huge amount of requests in an efficient manner.
This paper also talks about TCP connection handoff, which is the method used to seamlessly transfer a connection from the front end to the backend for the client in a seamless manner. This way, the backend server can directly contact the client for the response, which is very efficient rather than going through the front-end. Also, they have an efficient packet forwarding technique that runs in order of fraction of milli seconds.
With the TCP connection handoff(from the configuration in the paper), the paper says that it can handle 10 backend servers at an optimal pace without causing performance degradation which means it would be much more with the current configuration for the CPUs and network bandwidth that are available today. Also, the paper I thought could have integrated DNS level redirection to scale front-end servers. This is the environment that is used by most of the commercial web companies at present. The paper talks about only upgrades on the front end servers to better configuration as a way to enhance performance. It would have been interesting to see what performance improvement front-end scaling could have added when used with each of these request distribution methods.
On the whole, the paper presents some really key ideas for managing server side load.
Posted by: Srinivas Govindan | January 30, 2012 10:41 PM
In this paper, the author proposed a locality aware request distribution system which provides good load balancing and high locality, according to the simulation result and prototype testing.
The the LARD cluster consists of a front-end server that distributes the request from clients to a group of back-end servers. To achieve high locality and good load balance, the front-end choose either the back-end node that has served the request recently or with light load. Also, to achieve parallel processing, replication in back-ends is supported.
The problem the author wants to solve is to provide a cluster architecture that achieves high locality in the back-ends main memory cache as well as load balancing. In most clusters at that time, the system focuses on the load balancing with weighted round robin distribution and didn’t take advantage of memory cache hit, which can improve the performance a lot.
To evaluate the LARD architecture and compare it with other state-of-the-art request distribution systems, the author first developed a configurable web server cluster simulator which can compare the throughput, delay, CPU utilization and other metrics between different methods under different system environment. And they use the log data from Rice University and IBM to simulate the requests. Based on the simulation result, the LARD based cluster is consistent better than other request distribution methods. Also the author implemented a prototype of a LARD-based cluster which has performance consistent with the simulation.
In total, I think LARD is a very good design for the performance improvement of a server cluster. But there are two places that can be improved.
First, all the request are distributed by a single front-end server which can definitely become the bottleneck of the system with the increase of request, even though the author mentioned that it can be scaled by using a faster CPU or an SMP machine. Moreover, after the front-end server handoff the connection to a back-end server, to make the client transparent, the acknowledgement from the client will still go through the front-end server, which increase the burden of the front-end server and make it become the bottleneck of the scalability of the system(one front-end server can only support 10 back-end server).
Second the LARD with replication seems not able to handle the parallel updating problem. If two requests update the same piece of the data at two different back-end server at the same time, how will the system handle this? What if two read requests come at the same time, are they supposed to get two different result? As mentioned in the paper, the replication can only help when some data is requested very frequently. Maybe we should only use the replication when necessary.
Posted by: Xiaoming Shi | January 30, 2012 06:01 PM
Seth Pollen
This paper proposes an algorithm to balance the concerns of locality and utilization to maximize throughput for a cluster of web servers. They take the case of a cluster with a single frontend node that distributes work among a set of back-end nodes. The problem with existing round-robin schemes is that each backend node operates on the entire working set, meaning data is often duplicated in the disk caches of several backend nodes. They propose to make the frontend distribution strategy aware of the contents of incoming requests, allowing each backend node to be responsible for a smaller working set of documents and improving cache performance. However, a good distribution strategy must still take load-balancing into account, since some documents will be requested far more frequently than others.
The authors admit that their discussion is limited to serving of static HTTP documents, in which case each URL corresponds one-to-one with a cacheable file on the server. However, the introduction of dynamic content complicates this situation. A dynamic webpage is generated on the server using a set of one or more static code and data files on disk. The necessary sets of files required to serve two different dynamic webpages may overlap entirely or not at all. An intelligent request dispatcher could estimate or observe which groups of webpages share a large set of on-disk files and then attempt to serve all requests for those pages from one backend node. One heuristic that might serve here is to assume that webpages with a common URL path prefix will involve similar content and thus use some of the same files. A more rigorous solution could statically trace dependencies for all webpages. In either case, incorporating this consideration would complicate the given LARD-R algorithm significantly and may make it more reliant on the particular cache policies of the backend nodes.
One objection I have to the LARD-R algorithm is the policy for removing nodes from the server set for a document. They assume that if the traffic for a given page does not grow for some fixed interval of time, then it must be shrinking, warranting the removal of a node from its server set. While this approach may be workable if most high loads are transient, it will force unnecessary changes to the server set if pages remain at a steady high load. It seems that the same condition which triggers expansion of a page’s server set could be used in the negative to trigger the reduction of a page’s server set (with some kind of hysteresis to prevent too-frequent changes).
The issue of maximizing memory cache usage continues to be relevant to modern systems, since processor and disk speeds have continued to diverge since this paper’s publication date. The proliferation of caches between the processor and RAM further increase the importance of cache contents optimization.
Posted by: Seth Pollen | January 30, 2012 04:10 PM
The authors of this paper claim to make four contributions: (1) presenting a new technology called locality-aware request distribution (LARD) that makes improvements in cache hit rates and load balancing, (2) a trace-driven performance study of LARD and competing technologies, (3) an efficient protocol to allow one system to transfer a TCP connection to another, and (4) a performance study of a prototype LARD-based system.
LARD is a method to distribute incoming requests to backend servers that takes into account both the loads of the servers and the locality of the request targets with respect to the servers. The idea is that, if a server has recently served a request target, then it is likely to still have that target cached, and so the dispatcher should tend to give that server future requests for that target. On the other hand, if a server is significantly overloaded compared to other servers, then the dispatcher should tend to transfer a request to a different server even if that other server has not recently served the target, since the delay caused by a loaded server will eventually outweigh the delay caused by a cache miss. The paper presents two LARD strategies, the difference being that one strategy replicates request targets across multiple servers whereas the other does not (the former is called LARD/R by the paper).
In addition to presenting the algorithms for LARD and LARD/R, the paper presents an extensive simulation study of LARD, LARD/R, and other preexisting techniques such as weighted round-robin and locality-based (LB) request distribution. The study is based on a simple model of the backend servers representing both request queues to the CPU and disk I/O queues within each backend used in the event of a cache miss. Both studies examined throughput (requests served per second) for various request distribution strategies and found that LARD and LARD/R seem to have the highest throughput. The studies were driven by trace data from two sources. One was the main IBM website; the other was a collection of traces from several departmental web servers at Rice University.
One limitation of the study is that it seems to have only dealt with websites that primarily serve static content. With such sites, it is quite easy to get target information from the HTTP request alone, since there will generally be a one-to-one correspondence between the full URL and the target. However, since the paper was published, the web has undergone a transformation such that the most widely requested pages are generally dynamically generated, often specifically for each user, such that it is not necessarily the case that the HTTP header will contain all the information necessary to determine the target of a request. I would be interested to learn more about how to adapt this technique to such sites.
One flaw of the paper is that, while it examines the performance of various strategies using different numbers of backend servers, it does not really vary the number or types of requests. In my view, the latter sort of study would have been far more informative than what they actually did. Further, when the paper presents the results of the study, it uses linear scaling in its graphs (Figs. 7-8, 10-14), so one cannot really see much detail as to how the cluster handled the load with very few servers. I think it would have been better to use logarithmic scaling in those graphs, since I am more interested in how the system performed under resource stress than in how its performance changed as it gained resources.
Posted by: James Paton | January 30, 2012 02:07 PM
The paper is about a new policy for request distribution in cluster-based web servers that achieves higher cache hit rate by improving locality and also does good job on load balancing. Authors first describe current policies used in request distribution, develop their idea of locality-aware policies and explain its benefits. They demonstrate the benefits by doing both the simulations and real implementation of the policy in a cluster. For their implementation to be efficient, they develop an addition to TCP protocol, called TCP handoff protocol.
The problems they were trying to solve is better scaling of web servers in a cluster environment. The approach most web clusters of the time were using was weighted round robin, which, while producing equal load on back-end nodes, did a bad job in cache locality. All back-end nodes were serving all requests, so they all cached the same items. This produced high cache miss rate and made servers disk-bound, as proved in the paper.
The biggest contribution of this paper is the idea of focusing on cache locality and not only on load balancing when doing request distribution. The total cache size in round robin approaches was the size of the cache of one back-end node, because all items were in all caches. By distributing items across the caches and having one item be only in one cache (ideally), total cache size is sum of all caches of back-end nodes. This approach substantially improves cache hit-rate and makes web servers CPU bound instead of disk bound. Finally, higher cache hit-rate means higher throughput, which was demonstrated in the paper. One more contribution was TCP handoff protocol. Authors designed the protocol that enabled them to inspect the request on the front-end server and forward it to a back-end node in a way transparent to the user. Front-end server's network drivers processed the headers on a low-level, changing data fields that describe the source and destination of the packets. No changes were required on the client side or back-end side.
The paper did a very good job explaining and developing the ideas and testing them both by simulation and real world implementation. One of the flaws in the approach was the way they implemented TCP handoff protocol. Authors evaluated that every back-end node took 10% of CPU time from the front-end node only for packet forwarding, thus front-end server can have only 10 back-end servers of the same configuration. For today's scale, this seems much too low, even if the front-end server is a really high-performance computer. One way to address this flaw would be to implement handoff protocol in network card hardware.
The idea of locality in request distribution is still in use today. For example, when a user logs in to a site, all his subsequent requests are forwarded to the same small number of servers backed by a single mamcache machine caching his user information.
Posted by: Igor Canadi | January 29, 2012 05:58 PM
The LARD (Locality-Aware Request Distribution) strategy is introduced in this
paper. This aims to use content or type of request given to a cluster-based
server structure to decide which machine of the cluster should service a
client's request.
The problem the authors want to solve is the slow response time and low
throughput in the current servers used in WANs that are composed of many
machines. The cluster architecture they focused on was a single front-end
machine connected to many back-end machines. Also, some important goals are to
increase utilization of each machine and to avoid load imbalances or the
front-end overloading the back-ends. A future concern is that their strategy
will scale to many back-end machines (with an initial goal of 10 back-ends), and
faster machines with improved CPU speed and memory capacity.
As a way of achieving faster responses and improved throughput, they sought to
improve the frequency of cache hits on each machine that made up the cluster.
This was done by partitioning tasks to certain machines. This allowed a single
machine in a cluster to work on the same types of requests, and thus have a
greater chance of reusing the data loaded into its cache, making the server's
performance more CPU than disk bound. This group contributed an algorithm that
sent content to the appropriately assigned machine to handle the request, but
did this while balancing the load across the clusters. Along with this
advantage, they implemented the ability for a job to use more than one machine,
and to dynamically change how many machines are needed while the job is running.
One of the last parts of the paper contributed a TCP handoff protocol that
allowed the front-end to hand a TCP connection to a back-end without the need to
use the entire network stack for scalability of the front-end.
The largest flaw that I found in describing the LARD strategy was the small
discussion on hashing functions that achieve good locality and load-balancing.
This perhaps was considered outside the scope of the paper, but I see not
getting hashing right as something that could kill this entire strategy. Also,
there were no strategies to rebalance the load in the presence of very popular
content. The suggestion of an SMP front-end to increase scalability on the
front-end was somewhat weak and unsustainable as well.
These ideas were clearly a little before their time, since the working sets of
data that accompanied a client request were not big enough to fill a normal
machine's cache in 1998. Webpages have become larger and more complex,
and CPU speeds and memory capacity increased to an even greater point than the
authors simulated in the next 5 years. The "future" they predicted is now or
even a few years ago, and a cluster-based network server is essential in many
cases to handle high-volume web traffic. Current systems always search for high
cache hit rates, a good balance of load, and high utilization and throughput.
Posted by: Evan Samanas | January 29, 2012 04:46 PM
This paper introduces a new request distribution method that is locality-aware in cluster-based network servers. To make it works, a new strategy on how to distribute requests with locality-aware and load-balancing as well as a TCP connection hand-off technique are introduced. Experiments shows the proposed method has a significant improvement on throughput and utility, while it also gains obvious decrement on delay of single request.
The paper is based on the architecture of distributed system with one front-end server to distribute requests and several back-ends to serve the requests. There are two major challenges on request distribution: cache efficiency and load-balancing. The previous research either provided the solution to cache efficiency or load-balancing, but failed to provide an efficient solution to satisfy both requirements. For this architecture, the other factor that affects the performance is the efficiency to hand-off the TCP connections from the front-end to back-end that is transparent to the clients.
This paper solves both challenges in request distribution described above by introducing a new strategy called Locality-Aware Request Distribution (LARD). It distinguishes each request by its content and distributes them based on the content to different servers. At the same time, it implements a dynamic adjustment strategy to load-balance among the back-end servers. The strategy also handle the situation when the back-end servers has replicas for serving the requests with the same content. In this way, the cache efficiency is high since the server always handle the requests with the same content. For TCP connection hand-offs, the paper provides some modification on the front-end and back-end protocol interpreter to implement fast connection hand-offs. It introduces a forwarder to forward the ACKs from clients to back-ends. And it allows the back-ends to establish TCP connection with client without hand-shaking. Besides, the paper also describes the simulation model to measure the performance of LARD, and has done a systematic performance evaluation for LARD(/R) and TCP connection hand-offs compared to existing solutions.
From my perspective, there’s one flaw in this paper. The paper failed to describe the type of the architecture of the clusters. The clusters might be shared-disk, shared-memory, or shared-nothing. Shared-nothing clusters has obvious advantages compared to the other two. However, if the clusters are shared-nothing, when dynamically assigning one server to another set to handle different type of requests, there might be significant overhead on the migration of storage used for the new type of requests, while all the content cannot be stored on one machine. For example, this problem might happen among the back-end machines that handling the videos. It might have non-negligible cost when dynamically adding a lightly-load server to handle video requests. The cost of replica on videos might be too large to bring the strategy into practice, even if the adjustment is not frequent as described in the paper.
This paper proposes a great solution in request distribution which satisfied locality and load-balancing. It should be quite effective in increasing the throughput and utility of servers in practice. Some detailed techniques might be needed for specific applications.
Posted by: Xiaoyang Gao | January 27, 2012 05:47 PM