« Bigtable: A Distributed Storage System for Structured Data. | Main

FAWN: A Fast Array of Wimpy Nodes

FAWN: A Fast Array of Wimpy Nodes David G. Andersen and Jason Franklin and Michael Kaminsky and Amar Phanishayee and Lawrence Tan and Vijay Vasudevan. In Proc. 22nd ACM Symposium on Operating Systems Principles (SOSP) , Oct 2009

Reviews due Thursday, 4/26.

Comments

Summary:
The paper presents FAWN (A Fast Array of Wimpy Nodes), which is novel cluster architecture for low-power data-intensive computing. FAWN couples low-power embedded nodes with flash storage to provide fast and energy efficient processing of random read-intensive workloads (such as key-value store application). The paper also describes the design and implementation of FAWN-KV, a consistent, replicated, highly available, and high-performance key-value storage system built on a FAWN prototype.

Description of problem:
Recent years witness the significant growth of large-scale data-intensive applications, such as high-performance key-value storage systems. The workloads of these applications are mostly I/O intensive requiring random access and the objects stored are usually small. The cluster serving the workloads requires both high performance as well as low cost (energy efficient). However, the requirement cannot be well met by conventional disk-based cluster (poor random access performance and low energy efficiency) and DRAM-based cluster (high expense and high energy cost). How to build a cost-effective and energy-efficient cluster for data-intensive workloads by a conventional architecture but still achieving satisfying capacity, high availability and low latency becomes the problem that this paper trying to solve.

Summary of contribution:
As mentioned in the paper, the key contributions of this paper are the principles of the FAWN architecture and the design and implementation of FAWN-KV—a consistent, replicated, highly available, and high-performance key-value storage system built on a FAWN prototype. Actually, applying Flash storage in cluster architecture for date-intensive workloads is the most impressive idea in this paper, because flash storage exactly meets the requirement of such cluster (power-efficient, high performance for small random-access). However, certain limitations of flash need to be handled appropriately in order to build the cluster in good manner, for example using log-structured datastores to overcome slow random writes. The FAWN-KV is quite similar to Chord system (including replication, consistent hashing) except for adding front ends part and taking log-structured datastores into consideration.

Flaws:
This paper reminds me of the FTL (flash transfer layer) project in CS 537 which I took last semester. The criteria of that projects include the number of erases, wear leveling (whether the erase is even), capacity (how much space is available in addition to the overhead of metadata) and memory usage. The memory usage and capacity is well handled in this paper by using a compact in-memory (DRAM) Hash Index, and the wear leveling may be also good due to append-only datastores as well as the load balance of consistent hashing. However, I am not sure whether flash is suitable for such cluster considering the limited number of erase time as well as the fact that such cluster involves large number of writes. In addition, garbage collection happens till reaching the end of the log which may incur burst of latency which may affect the service periodically.

Application to real system:
In fact, the FAWN-KV in this paper is a good sample application. If the number of erase time of flash devices is not a problem in face of large number of random write, I believe this power-efficient design will be well accepted in Industry.

FAWN represents an unconventional strategy for lowering power usage in large
datacenters. It is a cluster of extemely low-power embedded computers with low
amounts of RAM, a small amount of flash storage, and using a log-structured data
store developed to be efficient on using flash. The problem that this group
sets out to solve is mainly extremely high power and cooling consumption that is
currently seen in large datacenters that serve data and compute power through
giant Internet entities like Google and Amazon. A larger problem that is
somewhat "solved" in this work is the existing performance gap between CPUs and
I/O. Using low-power, wimpy nodes with flash storage closes that gap
considerably and realizes power savings by orders of magnitude while not giving
up orders of magnitude of performance (for data-intensive workloads that are
heavy on reads) because of the power that is normally wasted when a CPU is so
much faster than its I/O.

The FAWN paper contributes the idea of using low-power, wimpy nodes and also
contributes a proof of concept implementation of a key-value store, a very
common application to run on a cluster within a large datacenter. This
implementation shows that very good power efficiency is achieved if the
key-value store is implemented such that sequential and random reads dominate
are used for most operations, and also if writes are batched into larger blocks.

FAWN's queries/joules numbers were certainly impressive, and in my opinion, a
strategy like this makes a lot more sense than a datacenter which shuts off
unused machines because of the difficulty of knowing when to shut off a machine
and the power needed to start a machine back up. However, these nodes certainly
were wimpy, which showed anytime the cluster was not operating under optimal
conditions. The only impressive queries/second rate was when the entire
working set fit into a FAWN node's cache, which can't have been very big, seeing
that in order to service a request from flash in a reasonable amount of time, it
needed to lookup a key in an in-memory hash. With 256 MB of DRAM, it was shown
that it is quite possible for this in-memory hash to get paged to flash. It
seems to me that under any actual load, this FAWN-KV service would reduce to
being bound on each node accessing flash more than once. Also, since these
nodes are very wimpy, the act of adding or removing a node took a lot of time
for the nodes involved in adjusting the consistent hashing scheme used. With a
cluster larger than 20 nodes, handling failures may substantially degrade
performance.

Distributed systems are designed more lately with power consumption in mind, and
this serves as a great example of what could possibly be the lowest power
consumption per node or even queries/joule for a large node system. Knowing
these bounds is good for designers when they are coming up with a happy medium
balancing initial cost, running cost, and performance in the face of the desired
workloads.

This paper proposes a system design geared toward usage in storage systems
along the lines of key-value stores such as Amazon's Dynamo or memcached,
which rather than using conventional powerful servers instead employs a
larger number of "wimpy" nodes. These nodes offer lower performance per
node, but at significantly reduced power consumption (enough so that
performance per unit power is significantly higher). The authors implement
a prototype FAWN system and evaluate its performance and power consumption
as compared to more traditional system designs.

FAWN consists of a number of slightly less-wimpy front-end nodes proxying
requests from clients to a larger number of wimpy back-end nodes, which are
arranged in a consistent-hashing ring with a set of virtual nodes for each
physical node. A physical node stores a log-structured append-only record
of key-value data items in a dedicated file for each virtual node it
presents, using a modest amount of local flash storage (the poor performance
of flash for small random writes being the primary motivation for the
append-only format). The backend nodes employ chain replication along the
hash circle, with front-ends sending puts sent to the head of the chain and
gets to the tail.

The authors evaluate a prototype FAWN system of 21 nodes, using what sounds
sort of like a "whatever odds and ends we happened to have lying around the
lab" assortment of hardware. The performance-per-watt achieved by the
prototype is well in excess of conventional systems under, though they point
out that a significant fraction of system-wide power consumption stems from
the network switches linking the nodes together, and this fraction would
likely only increase with a larger cluster. While they describe this as a
"topic of ongoing work" (though I don't know if anything's come out of it in
the three years since), it seems likely to be a tough problem, given the
somewhat "black-box" nature of networking hardware (whereas with commodity
PC hardware it's easy for a researcher to piece together a low-power system
from readily-available individual components, e.g. CPUs and DIMMs and
motherboards).

Though of course it's not necessarily always easy to get the equipment to do
so, I would have liked to see the evaluation done on (I think) more
appropriate hardware -- the CompactFlash storage used in the FAWN nodes,
while cheap and readily available, is quite slow compared to many other
forms of flash storage (and though I'm not 100% sure offhand what the power
tradeoff would be, I doubt it would be that much more for a significant
performance boost, as shown in their own Figure 9). Also, I'd have liked to
know what the utilization of various system resources were, specifically
with regard to CPU activity as compared to I/O throughput (i.e. if the
systems ended up being well-balanced in that regard, or if more energy could
have been saved on CPU power, say).

FAWN is a cluster computing architecture that is designed for a specific purpose of low-power data-intensive computing. It is also intended for a specific workload where the requests are IO intensive requiring random access over large datasets, and the size of objects stored is small (less than 1KB). Its design is centered around low powered CPUs and flash storage, which are the crucial components for the intended purpose. For example, the FAWN data store (FAWN-DS) for FAWN architecture uses log-structured key-value store to account for slow random writes. The performance of FAWN is tested with FAWN-KV, a key-value system built on top of FAWN-DS. The experiments cover a wide range of performance measurements for the system. The results shows that FAWN can complete significantly more queries with the same energy compared to traditional systems.

I think both the design requirements and the design presented by the paper are practical. Its main contributions must be the results which shows that it is possible to build power efficient clusters using low-frequency CPUs and flash storage. Although flash storage devices are expensive, the paper shows that they cost lower in the long run. In addition, the paper has a thorough evaluation section with measurements for different aspects of the system. It gives information for read/write performance at SSD drive level, node level, and cluster level. It also provides power consumption information, as well as performance during maintenance operations. It even includes some analysis on the situation where FAWN should be used.

Although the evaluation section is great, it does not include information on the frequency with which maintenance operations (especially compact) occur. I think it is important because those operations reduce the system throughput by half, and takes quite long to complete. Furthermore, the power consumption calculations do not take the power used during those maintenance operations. Although it might not change the final result for comparison with traditional systems, the maintenance operations are necessary for fast write operations, and therefore, should be included in power consumption calculations.

The experiments show that the solution provided works well for the intended data access pattern. Although the solution is only intended for small random read/write operations, it would be great if the experiments show the system performance for other common data access patterns. I think having a multi-purpose system (in term of data access patterns) would be better than a single purpose system.

With the increase in storage size of SSD drives, and decrease in cost, I think the ideas in this paper would become more relevant for cluster systems in the future.

The paper talks about FAWN, which is an attempt to create a distributed system out of not-so computationally powerful nodes. This system uses flash disks for IO and hence is suited to handle a specific kind of traffic. Some ideas behind the choice of such small group of nodes is that less computation means less power. Since the computation power is less, the disks come into play to store data that cant be in memory, hence the choice for flash disk was made to optimise access time. This system is not handled for doing hugely intensive computation but only designed to provide low-latency returns in case of requests that dont involve a huge amount of bytes being returned. This basically means that random accesses of small data should be efficient. In case of current hard disks, this is a major problem as it involves seek time which is very expensive, but using flash blindly isnt the most efficient solution either as random writes for flash is very slow. As is always the case, to optimize write speed in flash, a log structured file system is used, which means any write is just an append and all the data resides on the log. One immediate application that comes to mind when thinking about a low-latency requirement with random writes and reads is a KV-Store and the paper builds that system using a group of alix nodes with flash disks on them. Data is stored as separate db files on the flash disk and are appropriately partitioned as key fragments in the log and replicated at many locations as well. This idea was similar to the parallel database architecture where a database is built on top of many shared-nothing low-computation machines. I feel one of the major work in this paper is the detailed performance evaluation study that has been portrayed. The advantage of using a log structured file system for optimizing writes is clearly stressed everywhere with appropriate comparisons and I feel this is a major contribution of the paper. It touches almost every aspect of the design in the evaluation like how ring membership and maintenance operations affect the overall query latency. This system seems a good idea to be used as a storage system and considering the fact that it is very power-conservative, can be considered. As said earlier, it is only designed for handling specific kinds of workloads. For example, consider the case where you are building a multi-tenant cloud DB/storage system, high computation is a must in the nodes in those scenarios and this cant be achieved using low-computation nodes in these magnitudes. Also, the flash disks may get overloaded if this case causing the drives to become unusable at a faster rate. For me, the main point in this paper is the advantage of using a flash disk to achieve certain extent of optimality with power and performance.

FAWN is a design for random access data storage that can help reduce power usage and cost under certain uses. It is designed to use some more efficient components that will help lower power costs while necessarily harming performance.

A typical design for writing and accessing data is used for a large range of tasks that often include sequential data access and random writes. For FAWN, the use case is where data accesses will be random and writes will be sequential. A system reliant on hard disks will add up a large amount of both delay and power loss from seek times for the many random reads. Similarly, most designs will use recent CPU designs that may provide more performance than is needed for the tasks. CPUs designed for higher performance will usually still use comparatively more power compared to a more efficient CPU when that extra performance is not required, even with features that reduce power such as dynamic power scaling. Two metrics can be used together to measure performance and energy efficiency, 'queries per second' and 'queries per Joule of energy respectively. A goal is to keep performance reasonable while saving power.

FAWN stands for Fast Array of Wimpy Nodes, which means it still provides good performance but individual nodes in the system include components that are not as high performance as would usually be seen in a modern cluster design. It is designed for a log-based storage design where a large quantity of data is stored and read back but the cluster does not need to do a large amount of computation on the data itself. The log is designed to be append-only and the data is then spread across the different nodes using consistent hashing. There are multiple front-ends that can receive data and a configurable number of nodes that store copies of the data. Mappings from keys to the location in flash are kept in memory, but the mapping can be infrequently incorrect, requiring a second read attempt for fetching data. Writes are propagated along a chain of nodes, starting at the head node and continuing until it reaches the tail node. Once an update reaches the tail, the system knows that the write reached every node in the system that has a copy of the data. Several methods for handing the adding or removing of machines are included, which copy data across nodes and update the chains. The front-end nodes keep track of which back-end storage nodes are reachable by using heartbeat messages. FAWN has been designed for flash memory, but has been tested out in other configurations as well. Using a hard disk provides the largest amount of storage while using DRAM for the storage provides the best performance. The paper claims that the usage of FAWN with SSDs is a good middle ground between these two other options.

The paper presents ideas that work well for a limited set of goals, but it is not clear how useful they would be in other situations or for a design needing more flexibility. The paper addresses a few specific issues about detecting failures and having a node wait to hold on to information about data to send later to the next node in the chain, but does not cover much about what happens when this chain is unexpectedly and irrecoverably broken.

There are limitations imposed by the expected sequential appending writes and performance gains would not be as good if the reads were not as random as expected. The use of more efficient components when the performance of higher-end components is not needed is something good to remember. Reducing power usage would be a good idea for many designs if they can avoid reducing performance too much or increasing complexity beyond a reasonable amount.

Summary:
The paper talks about FAWN, a new cluster architecuture for I/O bound data-intensive but computionally simple applications. FAWN uses a large number of small low-performance wimpy node that have limited DRAM memory and moderate amounts of local flash storage. This combination greatly helps reduce the power consumption of the cluster as well as the CPU-I/O gap. The choice of flash as storage brings significant performance benefits for random reads and energy savings in terms of queries/Joule.

In order to overcome challenges the hardware brings up, FAWN comes with some key design choices. For per-node FAWN datastore, it uses an in-memory hash index and log-structured data log in flash to meet the memory constraints and high performance sequential writing. Then a FAWN key-value system is built based on FAWN datastore. It uses Chord-like consistent hasing for incremental scalability, chain replication for strong consistency and overlapping replication chains for each key range on the consistent hasing ring. It also exploits caching to improve read performance and prevents hot-spots (implicit filesystem cache on back-end nodes and a query cache at frond-end). Maintenance operations (split, merge, compact) are also properly designed, e.g., the two-phase split process enables minimal locking and ensures no existing replicas will be affected if any failure happens during the data copy.

Motivation:
The power has become a primary concern for clusters. I/O is usually the bottleneck of data-intensive applications and current disk-based/memory-based system is inefficient in performance or cost. High speed CPUs consume too much power but when CPU is waiting idly, the power scaling does not work very well. The problem is how to build a cost-effective cluster for data-insentive applications that is much more energy-efficient while still meeting performance (throughput, latency), capacity, availability requirements.

Contributions:
+ By combining slower CPU and flash storage, this novel architecure seems comparable to traditional systems at a much lower cost and higher power efficiency. The general comparison with altenatives is very nice and insightful, and enhances the paper to a new level. It helps one to make the right choice based on dataset size and query rate requirement.

+ The whole system is properly designed with FAWN hardware charecteristics in mind. Due to the limited capacity of memory, they choose to store only a fragment of actual key in in-memory hash index, thus it only requires only six bytes for each key-value pair. In addition, since flash memory is slow for random writes, append-only log-structured datastore is used for sequential writes.

Flaws:
- FAWN targets specifically for small-object, read-instensive, random-access workloads which do not require complex computations. But this assumption may not hold for a bunch of other datacenter workloads, which makes this architecure not appealing for use in general datacenters. For example, to acheive SLA while enjoying the low power comsumption of embedded CPU,FAWN uses more slower nodes in parallel to meet performance goal such as latency requirement. But not all data-intensive applications are amenable to this kind of parallelism. If the computing cannot be parallelized, then FAWN may not be able to catch the deadline of SLA. Memory-bound applications are also not suitable for FAWN. For instance, some machine learning algorithms may require building data structures that require several gigabytes of memory, while this could easily fit in the memory of traditioal architecures, it will quickly fill up DRAM on each FAWN node (256MB in the paper). Although there may be some workarounds to meet tight memory constraints, it may ask for significant implementation effort.

- The paper does not talk much about details on failures. They just considered fail-stop cases but did not touch other situations such as network partition. Not sure how they have improved fault-tolerance after then.

The paper discusses FAWN, a cluster architecture that supports low power data intensive computations. This has been designed to support large scale applications whose workloads are characterized by high I/O, less computation and frequent random accesses over large datasets. The motivation behind the design of FAWN is the high power consumption and the poor seek performance of the conventional disk and memory based clusters. In an effort to reduce power consumption but still meet the same capacity, availability, throughput and latency requirements, FAWN is designed with low power, efficient embedded CPUs with flash storage.

An important observation made in the paper is that low frequency in-order CPUs execute more instructions per Joule than their faster counterparts. It is also noted that when such fast processors are run below their full capacity they do not draw a proportionate amount of power. Since the focus is on data intensive operations here, FAWN saves on the disproportionate power loss by replacing the fast processors with low power wimpy nodes. Another design decision is to replace disk based storage with flash storage to improve performance. Though flash supports fast random read it is very slow when it comes to random writes. Hence a log structure is adopted to record the updates in the data store. FAWN has an in-memory hash index in the DRAM that stores a part of the key and a pointer to the datalog where the value is stored. Only a part of the key is stored to save space in the DRAM at the cost of extra reads when the actual key does not match in a few cases. It relies on the front end to map each key to a specific backend datastore through consistent hashing technique so as to reduce the amount of data transferred during addition and removal of nodes. At periodic intervals, FAWN-DS checkpoints the index by writing the hash index to disk. Apart from this periodic garbage collection happens to remove those spaces that are no more used in flash. Every node has a set of virtual nodes and each virtual node has a separate file in the data store. This kind of semi random write is supported to allow for fast maintenance. Also, to allow for fast querying the paper talks about caching that is done in the front end to reduce the effect of hot spots.

Some minor drawbacks of the paper are: lack of discussion about how the log structure handles the problem of fragmentation that arises when garbage collection is done. Does it run an entire pass through the log each time to copy all the data to a new location in the flash avoiding fragmentation ? Wouldn't the entire DS have to be locked when the in-memory hash map is getting updated reducing availability?

Although FAWN doesn't seem to be used by anyone right now, overall, the paper seems to have made a significant contribution by proposing solutions to the power management problem and the need for low latency in today's large data centers which handle large amounts of data intensive tasks.

This paper introduces a new cluster architecture, FAWN, which is A Fast Array of Wimpy Nodes. It consists of low-power embedded CPUs to small amounts of local flash storage, and designed for large-scale data-intensive computing but with low-power nodes.

Recently major Internet services deploy large-scale data-intensive applications. This applications are I/O intensive, massively parallel, and the size of objects is typically small. However, CPU and I/O gap becomes large nowadays. For I/O-bound applications, wimpy processes performs better and reduce I/O-induced idle cycles. CPU power consumption also grows super-linearly with speed, and dynamic power scaling on traditional systems is inefficient. So the challenge arises: how to build cost-effective cluster for data-intensive workloads that uses less power, but still meets the same capacity, availability, throughput, and latency requirement?

This paper solves the question by presenting FAWN architecture. It mainly consists of two parts, FAWN-DS and FAWN-KV. FAWN-DS is based on flash storage. This type of storage can support fast random reads and efficient I/O with much lower power-consumption. However, the random writes on it is much slow.

Based on the flash storage, FAWN-DS is designed to be a log-structured key-value store. Since it is log-structured, it is append-only system. This property can satisfy that the writes are sequential, and reads require a single random access. FAWN-DS uses a in-memory hash index to map keys to a value stored in Data Log. The hash index only indexes a fragment of key for a trade-off of the number of reads and the memory requirement. Reconstruction is easy based on the log-based structure and checkpoints. It also has separate files for each of its virtual IDs, and this property can form a semi-random writes, which is nearly as fast as single sequential append.

FAWN-KV is a key-value system. It consists of multiple front-ends and multiple back-ends. The front-ends send the request to the back-end node that owns the key-space for request, and the back-end node uses FAWN-DS to satisfy the request. FAWN-KV uses consistent hashing to partition key ranges to nodes. It uses caches to prevent hot-spots, and uses a strategy similar to Dynamo for replication that each node replicas the data at the R-1 following virtual IDs. For a new node joins, the process consists of two steps, 1) datastore pre-copy, and 2) chain insertion, log flush and play-forward.

One flaw for this paper is the lack of discussion on the failure detection and fault tolerance strategies for FAWN-KV system. The discussion only focuses on the back-end failures with a heartbeat mechanism. However, how to deal with the front-end node failure, and how to deal with the management node failure for front-end nodes remains unresolved.

Another flaw is for FAWN-DS system. It covers normal operations and strategies for maintenance. However, when the system runs for a specific of time, the amount of data would increase, and deletion and out-of-date data would introduce lots of fragments. However there’s no discussion on how to deal with them, for example, to reuse them or just leave them alone?

Overall, this is an interesting paper that provides an architecture with the consideration of the power-consumption. It is designed for I/O-intensive applications, which is a major type of today’s Internet applications. Since it uses low-cost hardwares and consumes less power than traditional architecture, it should be applicable today, and especially for new companies that doesn’t have a migration cost on old data.

Victor Bittorf, Seth Pollen, Igor Canadi, David Capel

Energy is a great thing, save the environment, and everything. This paper builds a key-value store on low power processors to save all the energy possible. It turns out with some clever tricks (and flash memory) it can be pretty fast. The author makes this cute graph of ‘solution space’ to demonstrate the strengths of various hardware setups. Unsurprising, their system covers almost all of the surface area in this plot... hm... A short google search doesn’t yield any results of companies that actually use FAWN, so, perhaps the solution space graph doesn’t capture all the relevant factors.

Before we name contributions, we have to address the fact that this in Intel. Yes, Intel makes low power computers. They go into detail of how they save area on the chip to save power, how awesome and innovative. But, if Intel’s market dominance has taught us anything, it is that x86 is an ugly instruction set. And it is common knowledge in the systems world that Intel dedicates a large area of the chip to decode their bloated instruction set into a RISC set which is then executed. Now, one has to wonder how much surface area is taken up to do this translation... ARM, on the other hand, which doesn’t have a bloated instruction set.. We conjecture that ARM would beat out Intel in Queries/Joule because of this. But this more of a side note, back to the actual ideas in the paper.

This paper does something neat; it shows that ARM processors (we suspect Intel has lost the long term battle in low power processors) are viable in data centers. Granted, it requires fairly specialized setup, but it can give reasonable performance and is a lot more energy efficient.

We won’t go into too many details of the Key-Value store part of it, since that isn’t really the interesting part. We’ll just say that we feel the key-value store was somewhat weakly implemented (just basked on how they handle ACKs and chain things, other systems we have studied do clever and smarter things). But we don’t really focus on this since this paper is more about the low power than it is about KV.

Interestingly, FAWN holds the Joule sort record right now (we believe that this is not in the key-value store, though...). So it is clear that this research group that produced FAWN is making more contributions than just KV. This demonstrates that FAWN is an active area and has a lot of potential -- not as active as other areas, though. For example, FAWN has 166 references on google scholar, compared to mapreduce with 4590.

Interestingly, our own DB group has something to say about FAWN, specifically “Wimpy node clusters: what about non-wimpy workloads?” As Jignesh puts it,

“Our results show that in most cases, computationally complex queries exhibit disproportionate scaleup characteristics which
potentially makes scale-out with low-end nodes an expensive and
lower performance solution.” [1]

So, from what we have gathered, FAWN requires highly specialized software development (e.g. very specialized file system at each node) in addition to network protocols for ACKing and chaining the request. Comparatively, mapreduce is a dirt simple and very generic model. Also, as Jignesh suggests, the FAWN system only works well with specific work loads.

Putting this together, FAWN ends up requiring a lot of specialized hardware which programmers are not familiar with and ends up requiring large development overhead to build all of the protocols and system level optimizations which are required to make it viable. Basically, it becomes a highly specialized niche system which may draw less power but requires significantly more development time and (as per Jignesh) will never be very generic.

I want to be clear that we do not dismiss energy conservation; but this seems to have high costs. And, this is a non-trivial result -- it could be used in a real system, but it is unclear if it actually is.

We pointed out in our discussions that A) KV-stores usually are not a bottleneck, B) It is a lot better for energy, C) “This is the future!”

But, on the flip side, only a big tech company would be able to deploy such as system given how many detailed low-level systems problems have to be solved. So in order for a large portion of the market to get use out of this, it would probably have to be sold as a turn-key solution. But no one wants to buy a turn-key distributed key-value store (who doesn’t already have the in-house skill to roll their own). So it seems like this could find a home in the cloud, where the huge initial infrastructure and dev time investment could be covered by the economy of scales.

Our best guess is that Amazon could make a FAWN and then sell time on it. Using ARM processors, though, sorry Intel.

[1] Willis Lang, Jignesh M. Patel, and Srinath Shankar. 2010. Wimpy node clusters: what about non-wimpy workloads?. In Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN '10), Anastasia Ailamaki and Peter A. Boncz (Eds.). ACM, New York, NY, USA, 47-55. DOI=10.1145/1869389.1869396 http://doi.acm.org/10.1145/1869389.1869396

This paper talks about FAWN, a combination of:

- A cluster architecture composed of small, low power nodes built from embedded processors, limited DRAM and flash storage and

- A software stack that runs a KV store highly tuned to the underlying infrastructure, on top of the wimpy nodes.

FAWN is primarily targeted to serve as a cost-efficient, low power datacenter infrastructure that serves small object, random access data intensive workloads while guaranteeing the throughput, capacity and latency requirements expected from any other datacenter built with a different infrastructure.

The problem the authors are trying to solve is to revamp the datacenter infrastructure (without spending too much) such that the datacenter runs existing workloads but at one-tenth the power.

Contributions:

- The biggest contribution of this paper is perhaps the very idea of using small nodes to reduce the power requirements of datacenters while addressing the workloads very typical of a datacenter. The idea of coupling flash storage with the wimpy nodes seems interesting from a power perspective.

- The various design choices involved in the design of the KV store incorporated the nature of the underlying hardware (the hashing algorithm is well suited for smaller DRAMs, the sequential streaming of updates in chain replication has a nice log-structured feel to it).

- The idea of taking advantage of semi-random writes seemed interesting.

- The idea of using multiple front ends with the front ends also serving as caches is a nice idea.

The paper had a few minor setbacks:

- The idea of building a dedicated log structured KV store on top the wimpy nodes and thus studying the whole system (wimpy nodes + storage stack) seemed less clear than using LFS on top of the wimpy nodes and running common KV stores on top of it. The latter would be a better evaluation of the infrastructure in isolation. The goodness of the nice KV store written by the authors could have been reported separately (to clearly distinguish the benefits of the infrastructure and the benefits of the algorithm).

- The usage of multiple front ends and the double mapping of key spaces (once at the front ends and once at the back ends) were not explicitly clear until the paper mentioned caching.

- The paper could have gone into a few more details on how garbage collection and the other management processes interfere (or don’t) with the KV process in light of the underlying infrastructure (a microarchitecture that has relatively fewer hardware threads compared to servers in addition to being a lot slower and a memory hierarchy that doesn’t allow much locality).

The paper is very much applicable to today’s datacenter infrastructure. With especially heavy costs going into the power consumption of datacenters, this infrastructure revamp seems like an interesting idea. However, there are the conservative opinions about wimpy nodes in datacenters and thus, for this to be truly accepted as commercial infrastructure, it would necessary to see how the infrastructure alone works with commercial KV stores in large scale.

The paper describes FAWN, an architecture for compute clusters based on low-power nodes. As the authors envision it, a FAWN cluster would consist of a large number of computers with embedded CPUs. Each computer would have a smallish SSD, a small amount of RAM, and a relatively low-performance CPU. Because of this, they would have a cheaper TCO, especially their power and cooling costs. The point of this paper is to argue that such an architecture is useful many real-world workloads. Specifically, the authors target I/O-intensive, non-CPU-intensive, workloads that are heavy with random reads.

To this end, the authors created FAWN-KV, a key-value store based on FAWN. In many ways, FAWN-KV is similar to Dynamo. There are two main differences. The first concerns how gets and puts are served. In FAWN-KV, each range of keys corresponds to a number of replicas. Puts are served by the first replica, which is the first successor to the key and at the head of the replica chain. Gets are served by the last replica, at the tail of the replica chain. When a put request is received, it is pushed to the next replica in the chain, which pushes it to its own successor, and so on until the tail is reached. The second main difference is that FAWN-KV seems to crucially rely on FAWN-DS for performance. The latter is a local datastore specially designed to be used by FAWN-KV and to be optimized for usage on flash. It uses a log-structured organization to turn every write sequential.

One flaw with the paper is that the process for a node to join a replica chain seems a little unnecessarily convoluted. In the authors' design, the joining (virtual) node is not allowed to participate until it has already received all of its data. However, it seems to me that it could join the replica queue early on and start serving requests when possible. That is, if it receives a get request, it could check the data it has so far and serve it if possible; if it doesn't have the data, it could pull it from the previous tail concurrently with the pre-copy/flush. It could start serving put requests immediately (this only works for certain kinds of puts, however; probably FAWN-KV supports conditional puts, etc., which could not be served in this way). This sort of design would allow for more responsive on-demand scaling.

Another flaw with this paper concerns its overall approach of advocating FAWN as a cluster architecture. Although there is no doubt that FAWN could have its place, today's data center is arguably more about cloud computing with virtualization. FAWN clusters wouldn't be so good at this because nodes must have a lot of RAM and be higher performing to host VMs. Further, the cost analysis at the end of the paper does not take cloud concepts into account, because they don't consider that a single high-power, high-cost node could actually serve as if it were a number of wimpy nodes and might therefore actually be cheaper than the same number of wimpy nodes.

This paper introduces FAWN cluster architecture for low power consumption while maintaining comparable performance, which uses less fast but more energy-efficient processors and flash storages. Based on FAWN, FAWN-DS and FAWN-KV are built as key-value store using log structure to show that FAWN is capable of doing same amount of work with less energy.

Energy consumption seems not to be a problem for most computer users since most laptops consume negligible energy. So it is surprising to know that a datacenter can take 10-20MW. In this case, reducing energy consumption is significant because it costs less money for electricity and it makes infrastructure easier to build to transfer electricity, which also costs less money. Definitely, energy-efficiency itself is important from the point of view of natural environment. Therefore, FAWN is really trying to solve a practical issue.

There are two major differences between FAWN architecture and traditional distributed system architecture. The first one is FAWN using low-power embedded processors. The other one is FAWN using flash storage rather than disks. I think the success of FAWN mostly attributes to the hardware. One key observation is that normal processors provide high speed with disproportionally high energy cost. Even frequency scaling doesn’t really save much. So considering computing speed and energy cost ratio leads us to adopt slower CPU. To compensate for the slower CPU, FAWN replaces disks with flash. This choice is easy to understand since huge CPU-I/O gap exist. Improving IO speed is likely to compensate the loss of CPU cycles in IO intensive workload.

On the software level, designs must be suitable to flash. The characteristics of flash storage are fast random reads and writes while slow random writes. Compared to disks, the difference is that random reads become fast. So to match the speed of random read on flash, FAWN must make random writes fast. A technique that has been used before on disks to convert random writes to sequential writes is log structure. This is able to take advantages of flash storage. Another interesting technique is chain replication even though it doesn’t look like to be specially designed for flash storage. Chain replication extends consistent hashing. Each write goes along the chain and each read starts from the tail of the chain and goes back so that each read is guaranteed to see previous writes.

A flaw of this paper is that most its evaluation is about performance while only small amount of evaluation is about energy consumption. I think the main goal of the paper is to say it builds a low energy consumption system. In addition, this system can use fewer energy to provide similar computing capabilities. Therefore, I feel there should be more evaluation on power issues. Another problem is about space utilization of flash storage. A bit should cost more in flash than in disk. While log structure design can convert random writes to sequential writes and improve speed, it has to require a certain amount of empty space for log merge. In this sense, wouldn’t the log structure cause too much cost since now the space on flash becomes more valuable than on disks?

In all, saving energy should be the right direction for future cluster construction. This paper presents a good solution at least in experiments level. It is interesting to see which company would make the first step to adopt this kind of systems in the future.

Post a comment