CS 739 - Reviews - Spring 2011: TreadMarks: Shared Memory Computing on Networks of Workstations

« Frangipani: A Scalable Distributed File System | Main | Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks »

TreadMarks: Shared Memory Computing on Networks of Workstations

C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, Vol. 29, No. 2, pp. 18-28, February 1996.

Reviews due Thursday 3/31.

Posted by Michael Swift on March 27, 2011 08:23 AM | Permalink

Comments

Summary:

This paper presents Treadmarks, a distributed shared memory system supporting parallel computations on networks of workstations. It provides a shared memory as a linear array of bytes with a lazy release consistency model. It is implemented as a user space library, and shows near-linear performance for large complex parallel applications.

Problem:

Network of workstations offers a parallel processing environment at a relatively low cost compared with expensive dedicated super computer. But the core challenge is to design and implement an efficient and portable distribute shared memory system in such environments. And the DSM should be easy for programmers to re-write the applications to take advantages of NOW.

Contributions:

1. Treadmarks provides a very precise API set for programmers. Nice examples are provided for illustrations.

2. The performance problems of sequential consistency model are introduced (large number of messages and false sharing).

3. Propose laze release consistency model and multiple-writer protocol to improve the performance and scalability of DSM.

4. Treadmarks is implemented as an user-level library, and easy to port in various architectures. Solid evaluations are presented, and Treadmarks can achieve reasonable near-linear performance of large complex applications.

Flaws:

In general, this paper presents a nice problem and an efficient solution. I don't find many serious flaws in it. Following are just several minor points.

1. This paper misses the discussion about whether Lazy Release Consistency is appropriate for for all applications. Since under relaxes memory model, the system does not necessarily always return a read the last written value.

2. For multiple-write protocol, Treadmarks does not handle the write conflict. In other words, the data race bugs are not detected and handled by Treadmarks. It just assumes that this will not happen in a correct application.

3. The evaluation is only limited to 8 nodes. The discussion of limitation of scalability is omitted. We want to see the limit of Treadmarks: in what scale of nodes, this DSM will work. After that, the overhead may overcome the benefits.

4. It misses the failure handling part. What happen if one node fails ? Will Treadmark crashes or adapts with the failure ?

Applicability:

The idea of DSM seems still useful nowadays, but in various different versions. For example, memcached uses distributed memory as a shared memory caching pool to improve the performance for web servers, based on object instead of memory page. In NUMA architectures, the same problem exists and similar solutions would be useful.

Posted by: Lanyue Lu | March 30, 2011 02:37 PM

Summary: Treadmarks is a distributed shared memory (DSM) system developed at Rice University. This paper describes several improvements it makes upon previous systems, and provides several clear examples of the applicability of their techniques.

Problem: The aim of DSM in general is to provide an abstraction of a shared memory space to allow remotely organized processes to coordinate as if they were sharing physical memory. The developers of Treadmarks recognized several potential optimizations in prior DSM implementations (most notably Ivy) that they could realize by adopting a more relaxed, but still semantically viable, consistency model.

Contributions: The concept of lazy release consistency is at the forefront of Treadmark's contributions. Underlying this technique is the observation that updates to a synchronized object in shared memory need only be transmitted to the next process to acquire the lock, greatly reducing message propagation. Treadmarks also develops a technique of tracking diffs in order to bypass the typical per-page locking requirement for DSMs to allow multi-writer memory sharing. They are able to do so without sacrificing consistency because, as they observe, concurrent writes to overlapping memory addresses would be inconsistent in a single processor threading scenario as well as in a distributed context.

Flaws: By implementing the Treadmarks system as a user application without dependencies on the underlying hardware, the authors maximized the generality and therefore the portability of their work, but in doing so they commit themselves to a dependence on programmer adoption of their libraries. The advanced nature of the use cases guaranteed a somewhat limited audience. On the other hand, if they had successfully lobbied for their work to be integrated in to a successful OS kernel, it may have had greater utilization and impact over time.

Applicability: It seems that the goal of maximizing utilization of heterogenous computing resources for parallel computation has gained more and more importance over the years since this paper was written, with no leveling-off in sight. Undoubtedly the consistency model described here, if not the specific techniques, persist in any distributed process synchronization scheme.

Posted by: Rich Joiner | March 30, 2011 08:11 PM

Summary:
The paper is about TreadMarks, which is a distributed shared memory system that offers parallel processing at a low cost by using commodity networked workstations.

Description of Problem:
Memory is faster than disk. Supercomputers are expensive. A network can consists of many commodity workstations. If the network is fast, is there a way to share the workstations' memory to all the other workstations in a fast and efficient way that offers parallel processing.

Summary of contributions:
TreadMarks provide a distributed shared memory system with a simple but powerful API that allows programs to create, destroy, synchronize shared memory. It uses lazy release consistency to reduce the number of messages needed to be sent. It allows for multiple-writers to access different pages of the same memory by keeping a diff of the pages that helps reduce false sharing and overall bandwidth since diffs are usually small. Their implementation is done on the user-level so it is portable. The authors have also created a few applications and benchmarked their results.

Flaws:
The authors mentioned that they didn't implement checking for two processes modifying overlapping portion of a page but they didn't mention what they would do to prevent this. They also didn't mention how they handle node failure and recovery. There's also no mention about security or how much memory a machine is willing to share unless they were envisioning that the whole network workstations was going to be used just for the distributed shared memory applications.

Application to real systems:
The idea of distributed shared memory is good. If all the issues such as false sharing, message overhead, and synchronization issues can be handled, then it seems like a useful system for some application. It seems like it also depends on how fast the network communication is in order to be useful.

Posted by: Kong Yang | March 30, 2011 08:30 PM

abstract:

This paper is about issues in design implementation of TreadMarks distributed shared memory system, which can allow processes to assume a globally shared virtual memory even though they are executed on nodes that do not physically share memory.

Problem

How to design a distributed shared memory system, which provides the abstraction of a globally shared memory, and related support, including synchronization primitive and consistency guarantees, for shared memory.

Contribution:

a. provides the same programming environments and synchronization primitives as hardware shared-memory multiprocessors, which can makes porting programs quited easily.

b. provides release consistency model to prevent data races. In TreadMarks, synchronization must be present between two conflicting accesses to shared memory.

c. design multiple-write protocol which allows multiple processes to have, at the same time, a writable copy of a page.

Flaw
I feel that this paper only present a general idea about TreadMarks, and it does not describe how they solve some more detailed problems. For example, when running some threads, how to choose the set of node running these threads. If nodes with different sizes of memory, and they are running same set of processes, how do they handle the situation that one node has no free memory, but others still have. And how do they choose to page out which process’s page, while considering the synchronization things and the whole process of the computing

Applicability:

TreadMarks is using commodity computers to get powerful computing resources. By using TreadMarks, if users need more powerful computing resources, they just need to add more nodes and run more processes.

Posted by: Linhai Song | March 30, 2011 10:24 PM

Summary:

This paper presents a distributed shared memory system, TreadMarks, which virtualizes a network of machines as a single shared memory multiprocessor. It provides this abstraction by utilizing virtual memory protection mechanisms to implement synchronization primitives. Under the hood, it uses a lazy release consistency model to reduce network traffic and network bandwidth usage.

Problem Description:

This paper addresses the problem of constructing a large parallel machine from networked workstations connected via a standard interconnect. The motivation for a solution to this problem is that building a system from a collection of workstations is cheaper than purchasing a highly-specialized machine designed for parallel computing such as a super computer. A solution would make large scale scientific computation more cost effective. This problem has not been solved before because the idea of using networked machines to construct a shared memory multiprocessor was pioneered by the authors of this paper. It would be interesting to discuss this system in the context of Condor.

Contributions Summary:

The major contribution of TreadMarks is the implementation of a global shared virtual memory address space constructed from networked machines. Programmers gain access to this system using a library interface that provides lock, barrier, and memory management primitives.

The second contribution is using lazy release consistency to implement memory consistency among the networked machines. The lazy characteristic implies that memory is synchronized before a synchronization operation. The authors argue that in a race free program synchronization operations always come before a memory access that can conflict.

The third contribution of this paper is a mechanism for implementing a multiple-writers protocol, which attempts to limit the effects of false sharing. When a write to a shared page occurs, TreadMarks makes a copy of the page that is later used to create a diff to send to other machines once a synchronization operation is reached. Other machines use this diff to apply the changes made by writing machine.

Shortcomings:

I take issue with the programmability of TreadMarks, specifically the release consistency model. The authors make the assumption that a TreadMarks program will use synchronization to ensure data race free operation. I think this assumption is particularly bold and quite difficult to achieve in practice, especially for problems with complex control parallelism. While sequential consistency does not ensure correctness in the presence of data races, I believe the behavior of data races on a sequential consistent memory system are more intuitive to debug. I do understand that release consistency was selected to provide scalability and better performance, but I question the development cost imposed by the increased difficulty in programming the system.

Application to real systems:

I think the design of TreadMarks and the lessons learned with Treadmarks can be applied to memory caching systems such as memcached that used by large scale web applications today. There goals are similar. memcached tries to utilize unused memory on other machines as a cache instead of accessing disk. Similarly, TreadMarks uses a distributed memory approach to provide the illusion of a single machine with more memory. Also, they both implicitly make the assumption that making disk accesses is slower than accessing the memory of another machine across the network.

Posted by: Dan McNulty | March 30, 2011 10:25 PM

Summary: TreadMarks provides a distributed shared memory (DSM) system for networks of independent workstations. Applications can make use of the shared memory through a user-level library with a relatively simple API to support a wide parallel computation needs. Some detail on making the system efficient are discussed, and several motivating applications are presented.

Problem: Many applications involve trivially parallelizable algorithms and when run on a single machine are unable to support parallel computation. High end machines with many cores have been developed for these kinds of applications, but are expensive and complex to program. In order to take advantage of the parallel computation capacity of networks of less expensive workstations without introducing program and communication complexity, we'd like to have an abstraction that makes the total pool of memory among all the machines appear as one large pool of memory. This would allow processes executing remotely on each machine to coordinate by using a shared memory.

Contributions: Prior work on DSMs relied on a strong consistency model in which the last write to the shared memory can be immediately seen by all processes. TreadMarks relaxes this constraint in the "lazy release" consistency model. The key observation is that applications with parallelism must already use synchronization in order to avoid races and thus a process only needs to see consistent data when a lock is acquired for that data. This approach greatly reduces the amount of communication needed to maintain consistency.

The authors propose an interesting optimization when there are multiple writers to a single page. They note that overlapping writes from multiple processes that are not synchronized indicate a race condition bug in the application. Such overlapping writes need not be addressed by the DSM system. Instead they only consider non-overlapping concurrent writes. In this case each process creates a "twin" of the page, the changes can be merged later by calculating the respective diffs and applying them to the original page.

Flaws: The main flaw here is the omission of a discussion of fault handling. For example, at any barrier a failed process could bring the entire application to a grinding halt. This could be acceptable for relatively short lived applications on small networks of workstations. Faults here would be relatively infrequent and applications could be easily restarted. But with scale this would be untenable. Another issue is that some of the applications they suggest seem more high-throughput oriented rather than high-performance. Could these kind of applications be better served by a job batching system like Condor that handles faults more thoroughly?

Applications: The authors pointed out a number of applications for DSMs including genetic linkage analysis, mixed integer programming, large matrix operations, and the classic traveling salesman problem. These kinds of parallel applcations have only grown in relevance since this paper. While this work is targeted to networks of workstations, similar techniques could also be applied to more tightly coupled system such as many cores in a single machine that share non-uniform memories via a high speed interconnect.

Posted by: Kris Kosmatka | March 30, 2011 10:43 PM

Summary
This paper describes Treadmarks, an abstraction that makes distinct memories on different systems appear as a single shared memory via network communication. A protocol for doing this efficiently is described, and simpler previous approaches are also discussed. Finally, the system is implemented.

Problem
Using distributed systems constructed from commodity hardware is becoming increasingly cost effective relative to high end machines for large computing tasks. Unfortunately, distributed systems are harder to program, and developers often do a lot of tricky programming instead of using abstractions.

Contributions
The idea of a distributed shared memory is not new, but this paper does a great job of describing how it can be implemented efficiently. The naive approach and the eager release consistency are described (the naive approach seems to be functionally equivalent to eager release consistency with a lock around every block access). The approach suggested is lazy because it waits to synchronize until a future thread acquires a lock instead of synchronizing as soon as the current thread finishes locking. The paper also contributes a working system in addition to the theoretical ideas. The ideas are fairly straightforward, so I focus primarily on flaws even though I liked this paper.

Flaws
Correctness. Some applications may use lock free protocols such as RCU and rely on the atomicity of individual machine instructions (e.g., the write of an address to a word). Such techniques would work on a real shared memory with threads, but not between multiple processes operating on the illusion of a shared memory.

Performance. Consistency is handled via mprotect and fetching pages on a fault. This will be slow. Imagine a process is sequentially streaming though a large section of invalidated memory. At each page, the process will block, the page will be fetched, and the process will continue. If the programmer were handling the communication, the data could be fetched in bulk, drastically reducing total latency.

Ease of use. Treadmarks is supposed to be a simplifying abstraction, but its interaction with other abstractions used in distributed systems may not be intuitive. For instance, suppose an application is originally written on a single machine with threads and a shared library. Further suppose that thread 1 has access exclusive access to object A, but it requests that thread 2 perform some update on A (request is via a pipe), blocking until thread 2 is finished. Now let’s say the application is ported to Treadmarks. The threads become processes on different machines, the call between thread 1 and thread 2 becomes an RPC call, and all locks become Tmk locks. Although a lock for the call from thread 1 to thread 2 was not originally necessary, locking is now necessary so that process 2 will invalidate the appropriate pages and pull the latest copy of A before modifying it. Thus, programmers cannot simply forget that the memory is distributed.

Application to Real Systems
In general, people that need the computing power of a distributed system will have significant financial resources and will likely be willing to pay programmers to build a system without abstractions that will outperform a Treadmarks systems for the reasons described above. However, Treadmarks would be useful when an ad hoc program needs to be written to solve a problem that only needs to be solved once. In this case, the programming effort of working without the convenience of Treadmarks is probably not worth the performance gains.

Posted by: Tyler Harter | March 31, 2011 12:15 AM

TreadMarks: Shared Memory Computing on Networks of Workstations
Summary
This paper describes the experience of building a parallel computing distributed shared memory infrastructure over networks of workstations. In order to solve data race problems, it proposes the release consistency model and a multiple-writer protocol to increase the concurrency of processes and reduce network communications between them.

Problem
1. Since nodes in the DSM system have their own private memories, data migration and replication is needed to improve locality of data. But when the data get updated, nodes must be notified to invalidate their stale data copies. When updates are frequent, these network communications are expensive for workstation. A better consistency model is needed to reduce the interactions between nodes.
2. False sharing problems. Because page is the minimum unit of data, it's common that two unrelated data objects are placed in the same page. If two processes update those two unrelated objects concurrently, they will keep transmitting the page back and forth, which is unnecessary.

Contribution
1. Propose the release consistency model that by using synchronization between two conflicting accesses, processes don't need to inform other processes of the modification until the synchronization completes. This reduces the network traffic. Especially in the lazy release consistency, only the next process that gets the lock is informed of the change, while in eager release consistency, change of data are broadcast. Through barriers and exclusive locks, data is protected just like critical section.

2. To solve the false sharing problem, multiple-writer protocol is proposed. It allows multiple processes to concurrently update the same page but different objects in it. The conflicts of the pages are resolved by exchanging the differences of pages between processes when they synchronize.

Flaw
1. For the multiple-writer protocol, if a page is frequently modified by many processes, total size of differences of page may be even larger than the page size. In this case, exchanging the differences of pages may be a bad choice.
2. Programmers may need to be familiar with the synchronization primitives and be careful to use them, which increases the difficulty of programming.
3. Although multiple-writer protocol can handle false sharing problem, is it a better way to handle data in record level instead of page level? Because computing the difference of pages still require the full scan of page content, which is expensive.
4. Nodes may fail when it is holding locks, it would be better to use lease mechanism to ensure the lock can be released properly under failure.

Application
Release consistency is good model for applications that have a lot of false sharing. Multiprocessors utilize eager release consistency to handle cache coherence problems. Lazy release consistency is widely used in the DSM system.

Posted by: Weiyan Wang | March 31, 2011 12:29 AM

Summary:
This paper discussed several design points when implementing a DSM system, and presented TreadMarks, a DSM system which provides an unstructured memory space and lazy release consistency model.

Problem:
How can we use networked workstations to realize parallel processing at a relatively low cost? Though DSM provides attractive abstractions to the programmer, it is challenging to implement it effectively with low communication overhead yet still provide appropriate consistency guarantee.

Contributions:
1. The observation that access to shared memory region must be synchronized is very interesting; and the lazy release consistency model leverages this observation fully to minimize the communication overhead (only the next process acquiring the lock is given the modified page).
2. The multi-writer protocol deals with concurrent writes which happen in the same page but at different locations well, thus alleviates the problem of discrepancies between the machine’s page size and the application’s granularity of sharing.
3. Both of the above ideas are implemented and evaluated in a realistic setting.

Flaws:
1. This paper did not talk about failure handling at all, which is a serious concern for such a distributed system. Especially given that in the face of failure the process could lose part of its address space!
2. In the evaluation section, the author only discussed how the applications scale compared to sequential implementation. I don’t think it’s a fair to compare TreadMarks with sequential implementation, a comparison between TreadMarks and Ivy (which implements sequential consistency).

Applicability:
It does not seem like TreadMarks will handle failure well, so this idea is not very applicable to today’s networked workstation. But in an environment which has a simpler failure model, say a super computer with non-uniform memory access and private cache for each processor, TreadMarks’ design would be quite applicable.

Posted by: Suli Yang | March 31, 2011 12:32 AM

Summary

This paper proposes TreadMark, a parallel programming library that provides shared memory abstraction, for the ease of programming on networks of
workstations and achieving good performance.

Problem

Parallel programming on networks of workstations is not easy, given that naive
message-passing programming model requires users to decide when a processor needs to communicate, with whom to communicate, and what data to send.

Contribution

- TreadMark provides simple APIs: Tmk_malloc for allocating globally shared
buffer, Tmk_barrier and Tmk_lock_acquire/Tmk_lock_release for synchronizing
shared memory access.
- TreadMark uses lazy release consistency memory model. The system is said to
provide lazy release consistency, if all write operations by a certain node
are seen by the other nodes after the former releases the object and before
the latter acquire it, and all coherence actions are delayed until after
a subsequent acquire.
- TreadMark uses multiple writer protocol to alleviate the problems of mismatch
between page size and application's granularity of sharing.
- TreadMark runs on user space, so it is highly portable.

Flaw

I think it would be better to address these issues:
- How to deal with Deadlock? I guess they shift the burden to application developers.
- How to deal with factors affecting performance, e.g., data on local machine
and on remote machine? A shared memory model on a network of workstation looks
like a NUMA (Non-Uniform Memory Access) system.
- How to deal with failures (especially partial failures)? This paper looks like
dealing with a machine with multiple processors, rather than multiple machines
with network connections ...

Applicability

Shared memory model is prevalent for multi-processor/multi-core programming, for
example, OpenMP and CUDA from NVIDIA. However, shared memory model is not good
for scalability of general purpose programming, in terms of either the number of
machines or the number of processors/cores on the same machine. I think for
special purpose programming (with certain data access patterns), shared memory
model can simplify things a lot -- MapReduce would be an example.

Posted by: Wenbin Fang | March 31, 2011 01:29 AM

Summary
This paper presents TreadMarks, a distributed shared memory system. TreadMarks virtualizes a network of machines into a single multiprocessor system, providing access to memory via user-level library calls.

Problem
With the rise of sizable networks of computers in recent years, the idea of combining the machines on a network into one large multiprocessor is appealing. One difficult aspect of this is getting the nodes in the network to share memory across machines in such a way that it is possible to do parallel computing on them.

Contributions
The biggest contribution of this paper is the implementation of a shared memory system which can be accessed via user-level libraries. Their library provides various mechanisms for accessing and allocating memory, as well as a few synchronization primitives.

There are several interesting aspects of their implementation. The biggest ones were there algorithms for improving consistency models and protocols. Specifically, they used a lazy release consistency model, which takes advantage of the fact that when a process releases a lock, only the next process to acquire the lock needs to know about it. This observation allows the developers to greatly reduce the amount of communication between nodes. They also came up with an optimization for multiple writer protocols, which takes advantage of diffs in order to decrease the frequency with which entire new copies of pages need to be made.

Flaws
I thought the biggest problem with this paper was that it did not take into account partial failures. In this system, a single machine crashing would cause significant problems. Any data that was being stored in its memory would be lost, and other nodes would have no way of knowing about this short of attempting to access the memory and failing. Even more importantly, if processes are waiting at a barrier but one has crashed, they will wait forever. In a distributed environment, the system has to be able to tolerate these kinds of failures.

Application
Given the above problems, I think this particular system has limited applicability to real world systems. However, I do think some of the ideas it presents are interesting and could be used elsewhere. The methods it uses for optimizing and improving various forms of consistency seem very useful. I think there are some interesting ways this could be combined with other systems in order to create a useful distributed system. For example, if “threads” were replaced with “jobs,” TreadMarks could do something similar to backup jobs used in MapReduce in order to prevent infinite blocking at a barrier.

Posted by: Ben Farley | March 31, 2011 02:28 AM

Summary

TreadMarks is an experience paper that shares some design decisions in designing a distributed shared memory system. The paper particularly focuses on the memory structure and consistency model and explains how lazy release consistency and multiple-writer protocol can improve performance.

Problem

Given a set of work stations connected by a network , a programmer has to really worry about where a data is stored and write code for interprocess communication with complex data structure. The Treadmark DSM strives to make the programmer focus on developing solutions for his problem instead of managing distributed data. It explains how the memory is structured and the lazy release consistency protocol.

Contributions of the paper

Treadmarks provides a set of APIs that provide process creation, destruction, synchronization and memory management transparently to the programmer. It explains two simple examples if traveling salesman and Jacobi iteration on how these APIs can be put to use. Although total ordering (sequential consistency) seems natural and intuitive the overhead caused in the form OS traps, messages exchanged, state needed makes it less feasible. Also false sharing(2 different object updates in same page) could waste lots of effort and increase unnecessary invalidation. and thus in treadmarks a lazy release consistency model where an update need not be known to other processors while memory update on a data is known through explicit synchronization by the lock APIs provided by Treadmarks. Also a multiple writer protocol is presented where each processor update their own local copy and later synchronized by transferring the diffs which is of lesser cost. The messaging system is similar to a reliable UDP mechanism where message delivery is ensured. Also since Treadmarks run as a user process portability to various unix implementation is easy.

Concerns
As applications evolve, sometimes a programmer has to know about the location and partitioning of data to make important decisions for security, performance and availability reasons. Abstracting this information from the programmer could be undesirable in this case. The paper also does not talk about reliability i.e what happens when a node crash and it has a lock on some object.

Applicability
It is a neat concept to provide an API for parallel programming. The lazy release consistency protocol and multiple writers protocol can be implemented in some real systems. This paper is somewhat similar to DISCO in virtualizing main memory .

Posted by: Karthik Narayan | March 31, 2011 02:44 AM

Summary

The paper highlights several features of the TreadMarks System, a Distributed Shared Memory (DSM) System. It highlights some of the design decisions made in TreadMarks, such as the mechanism used to maintain consistency, and the advantages of TreadMarks implementation over competing DSMs.

Problem

A system of networked workstations may be utilized to improve process performance via parallelization. However, such systems increase programmer burden due to the sharing of memory, requiring the developer to implement data sharing & coordination facilities. A DSM system, provides the abstraction of a globally shared memory, and is responsible for handling all memory accesses. This frees the programmer from the burden of retrieving the current value of a given data item.

Contributions

The paper makes several contributions which highlight some of the design implementations utilized in TreadMarks. To begin with, the paper presents the TreadMarks consistency model, Lazy Release Consistency, and the basis for correctness of the model. The Lazy Release Consistency Model, attempts to reduce the high communication required in other models, such as the Sequential Consistency & Eager Release Consistency models, used to maintain data consistency. The Lazy Release model relies on TreadMarks synchronization facilities, to inform only the process which obtains a page’s lock of any prior changes invalidating the local copy. Correctness of the Lazy Release Consistency Model relies on the fact that parallel processes must utilize synchronization in order to prevent data races and ensure correct behavior. Since the synchronization limits a single process access to a data item, changes made to a page need only be visible at that process.

Another major contribution made by the TreadMarks paper is the use of a multiple-writer protocol to increase parallelism as well as reduce the number of false page sharing messages. Each process maintains the original as well as modified copies of a page. On a potential write conflict by multiple processes, the conflicting processes exchange the page differences with each other to determine if indeed overlapping writes did occur. An additional benefit of the multiple-writer protocol is the reduced size in messages, since a diff message will generally be smaller than an entire page.

Flaws

The paper does an excellent job of presenting the motivation for DSM systems as well as the implementation decisions used in TreadMarks. One area where treatment was rather terse was the replication and migration policies used in TreadMarks. Such policies have an impact on system performance and are discussed in papers discussing shared memory models such as ccNUMA.

Applicability to Real Systems

Many of the ideas presented in the paper are applicable to real systems as seen by the sample applications presented in the paper. That being said however, the ideas seem limited to certain environments and problems.

Posted by: Greig Hazell | March 31, 2011 07:35 AM

Summary
“TreadMarks” is an implementation of a distributed shared memory system, usable through simple C bindings. It achieves efficiency through relaxed consistency and allowing multiple writers at the page level. Implementations of various parallel algorithms are discussed, and the performance is reviewed.

Problem
Because of the increasing prevalence of networks of computers, the issue of how to take advantage of such resources efficiently and easily is an important problem. If parallel programing for shared memory multiprocessors is a challenge; programming networks of computers is a nightmare. One primary reason for this is that the standard message passing models are non-intuitive for programmers and error prone. TreadMarks provides the abstraction of a memory space which is globally visible and appears consistent, provided that applications use locks in the expected manner.

Contributions
The authors provide an implementation of a distributed memory system which uses a unique consistency model and optimizations. The first major contribution is the “lazy” release consistency model, which only performs updates at the start of synchronization operations. This allows the system to send fewer messages in situations where there are updates to heavily shared data.
The other major contribution is the multiple-writer protocol, which allows multiple nodes to be updating a page at the same time. The motivation behind this is that the absence of this ability causes legitimate inter-page sharing to require added synchronization or incur data loss. Basically, they can’t guarantee correct arbitrary write patterns without it. The essential mechanism is to create a copy at a write request, and diff the copy at release time. Then, subsequent read requests will only update the portion of the data which was modified.

Limitations
The primary flaw was that the paper has grayed out, incomprehensible diagrams and text sections, which contain useful information for understanding the system. All joking aside, the paper is fairly cursory in some sections, and only provides a link for a description of the coherence protocols used. One issue I had was that figure 7 shows process 3 getting a lock directly from process 2, without having any knowledge of the lock’s owner. Later, the paper explains that a node will contact all nodes in the system during each request. This is reminiscent of a bus based protocol. I wonder if a directory protocol would be more efficient at larger system sizes.

The other main limitation is programmability. Locks are hard to use properly anyways, but having a heterogeneous address space seems like a recipe for disaster, or at least headaches. Also, can this work with threading?

Application
The issue of intuitive parallel programmability is very important. The most important idea in this paper, the use of a logically shared memory space for programmability, is still prevalent in many systems like the new DARPA HPCS languages.

Posted by: Tony Nowatzki | March 31, 2011 08:02 AM

Summary
Advances in computer networking and processor technology have enabled a wide variety of applications to run on networks of machines instead of a dedicated multiprocessor machine. This paper describes the TreadMarks distributed shared memory system. This system provides some level of transparency to the programmer, so that he does not need to worry about complicated message passing or parallelization methods in his program. However, parallelization on multiple machines poses some consistency issues that the authors try to address.

Problem Statement
To create an efficient platform for parallel computing using networked commodity hardware.

Contributions

Developing a platform where programmers can easily develop applications that use a network of machines.
Developing a “portable” platform, where the applications written to work on a distributed set of machines can be easily ported to a single machine with many processors. Also, DSM can run on Unix workstations without kernel modification.
Introducing Lazy Release Consistency and comparing it with sequential consistency.

Critique
In release consistency, synchronization is introduced to the program to avoid data races. Once there is no data race, then it appears as if the program ran sequentially. They also rely on not having data races for the multiple writer protocol. However, it sounds like a strong assumption for me to assume that the program is data race free and build a protocol based on that. It is hard to ensure that a program is data race free, and thus it sounds like that it is hard to be sure that our system is consistent with this mechanism.

It sound like that systems like this which try to achieve parallelization/consistency at the system level and not the application level cannot run on ‘any’ kind of networked machines. For example, the latency between the machines in this system is in order of microseconds, where if you want to travel between continents such as in Condor this time is in order of milliseconds. Thus, I believe it s worth noting that TreadMarks is focused to use ‘highly localized’ clusters of machines to achieve performance comparable to a single multiprocessor machine.

Application
Programs that can be relatively easily parallelized and at the same time appropriately synchronized are good candidates for this system. Examples include Mixed Integer Programming models that was mentioned in the paper.

Posted by: Fatemah | March 31, 2011 08:26 AM

CS 739 - Reviews - Spring 2011

TreadMarks: Shared Memory Computing on Networks of Workstations

Comments

Post a comment