CS 739 Review Blog: TreadMarks: Shared Memory Computing on Networks of Workstations

TreadMarks is a distributed shared memory which addresses the fundamental requirement of a shared memory by incorporating multiple independent processing nodes with local memory modules, connected by general interconnected network.

The main motivational force behind the TreadMarks system is to utilize the commodity hardware through parallelism. Alternatively memory can give an overall picture of single contiguous block across nodes.

Treadmarks is implemented in userspace. Its globally shared userspace across different machines over netork distinguishes it from rest of machines which most commonly implements message passing interface. If provides various synchronization primitives like locks and barriers. The system is an implicit synchronization of memory. The system implements release consistency model as the designers closely observed the constraints associated with both lazy as well as eager consistency model. TreadMarks provides comprehensive solution for designing multiple writer protocol as it utilizes virtual memory aspects and provides mechanism writing faults as well as event synchronization. TreadMarks is more importantly DSM over UDP/ IP with page level granularity at its core. Besides it, programmers use a small API for setup and synchronization of such a system. It favors the generic trend towards adoption of shared memory interface which keeps programmers free from issues of communication management.

Thus TreadMarks is an integrated approach to the DSM and provides innovative mechanisms like release consistency and multiple writer protocols. The system acts as an abstract single, large, powerful machine from a group of many commodity machines.

The system for its adoption on mass scale must first be evaluated on all aspects from micro benchmarks to major performance issues. There are doubts about TreadMarks implementation for universal adoption as its performance may be merely marginal in small networks. Also how much it costs for providing consistency and coherency on distributed platform should be transparently observable so that system designers may be confident about its adoption.

Posted by: y,Kv | April 8, 2010 06:49 AM

Summary:
This paper describes the design and implementation of TreadMarks, a distributed shared memory system built as a user-level library on top of Unix.

Problem Description:
The DSM model assumes and provides interface for globally shared virtual memory among all the nodes, which can make programmer’s life easier. This paper mainly focuses on two aspects of DSM system designing: the memory structure and consistency model. Previous research work have proposed various models which are options for TreadMark but have problems:
The sequential consistency model is simple, but can cause extensive communication, and false sharing problems. The single write protocol used by most DSM systems also have those problems. TreadMarks aims at addressing such problems and achieve better performance without losing simplicity.
Contributions:
TreadMarks is not the first DSM system implemented, but it has some new features:
-the lazy release consistency model: in sequential consistency model, every write is visible immediately. While in release consistency model, the write is visible to another process only when a subsequent release becomes visible to it. A lazy release consistency would pull modifications at acquire rather than push them at release. The lazy release consistency model largely decreases the message passing and eliminates data races.
-Multiple writer protocol: in this protocol a local twin copy is created when writing, then a diff is created by word-word comparing at the barrier. When multiple writer synchronize, the modification is pulled. In implementation, TreadMarks uses the virtual memory hardware to detect modifications. This protocol can address false sharing problems as well as reduce bandwidth usage.
Applications:
TreadMarks has fulfilled its goals to address problems related to consistency models. Experimental results on benchmarks are good. It is implemented entirely as a user-level library on top of Unix and highly portable. It does not talk about failure detection and recovery issues, but I guess there might be further works to make TreadMarks are more complete and reliable system.

Posted by: Chong | April 6, 2010 04:50 PM

Summary:
TreadMarks is a user lever distributed shared memory (DSM) system with a relaxed consistency model.

Problem:
The authors use the example of Ivy, the first DSM, to illustrate some of the issues related to sharing memory across a network transparently. The first is sequential consistency, which is similar to synchronous writes across all nodes. Ths is the most intuitive approach, however it leads to too much message passing between nodes, which has significant performance issues. The other problem is “false sharing”, where two processes are attempting to write to different parts of the same page. At each write, the page appears invalid and must transfer across the network, thus causing what the authors call “the ping pong effect.”

Contributions:
The authors experimented with two possible solutions to the Ivy issues: lazy release consistency model and multiple writer protocols. Release consistency essentially works by only allowing processes to accesses parts of memory after a synchronization operation, which means processes do no need to be informed about all modifications until that synchronization operation. This significantly reduces the amount of message passing required by the DSM. Lazy release consistency improves on this by only informing the process acquiring the lock about changes instead of sending a message to all other processes. The second improvement is a multiple writer protocol, which allows for pages (or copies of pages) to be writeable on multiple processes concurrently, which addresses the problem of false sharing. Processes pass around diffs of their modified pages, which reduces traffic. If two processes modify the same parts of a page concurrently, this is considered a program bug, since it is essentially a data race.

Application/Thoughts:
A DSM works basically the same was as a virtual memory: it gives the application programmer an abstract view of a global memory space without requiring him/her to worry about the boundaries between nodes. This is very useful for computationally intense application that need multiple workstations to complete in reasonable time but also must share data seamlessly between processes. The authors give multiple examples. However, the time costs of TreadMarks seem quite high, which would probably cause application performance to be bound by the time required to access pages on remote machines. This seems to reduce the worth of a DSM.

-Ryan Johnson

Posted by: Ryan Johnson | April 6, 2010 03:22 PM

Summary and Description:
The paper presents a distributed, shared memory implemented on top of a network of workstations. The system abstracts the physical memories possessed by each processor in the workstation into a single global virtual address space. As a result, it enables us to do parallel programming without explicit messages between the processors.

However, some distributed shared memory systems, although easier to program, involve a lot of overhead. This is because these systems implemented sequential consistency among the shared data. The paper enables the ease of programming without introducing much overhead by adopting a form of consistency called release consistency. Intuitively, under release consistency, when access to all shared blocks of data are synchronized, sequential consistency is observed.

Contribution:
1. The main contribution of the paper is introducing a distributed shared memory system in which release consistency is observed (instead of sequential consistency).
2. When false sharing is present between two processors, release consistency will pose some unneeded overhead between the two processors. The paper introduces a "multiple writer protocol" to overcome this.

Applicability:
In effect, the paper presents another programming model, equivalent to the MPI model, of writing parallel programs in a networked multi processor environment. The model can be adopted for all software in which shared memory will ease the program. However, I do not understand why the authors feel that it will be very hard to use a MPI like model if the program involves complex data structures.

Posted by: Thanumalayan Sankaranarayana Pillai | April 6, 2010 02:37 PM

This paper presents Treadmarks, a distributed shared memory system. Here, a bunch of workstations share an address space created from the pieces of underlying memory of different machines. The article mainly discusses the design rationale of Treadmarks, and its implementation.

Memory operations in Treadmarks follow release consistency: When concurrent programs are running, they explicitly acquire and release locks to access shared memory locations. The same page may be located on many processors, and the new version of the page is brought in when an acquire is performed. In other words, a write is propagated when the writer releases the lock and the new processor acquires the lock. This model requires far less message traffic for synchronization compared to stricter models like sequential consistency. Treadmarks provides locking and barrier primitives for synchronization. Treadmarks also supports simultaneous writes to the same page. When two independent writers synchronize, their copies are invalidated, and diffs are exchanged between the processors, and the pages are made consistent. Thus, false sharing is effectively handled.

The authors show some interesting data intensive parallel applications that can be run on Treadmarks. They show some performance improvement against sequential execution, which is bound to happen when you parallelize computation. What is not clear is how it compares against other parallel execution platforms. Overall, the paper sounds convincing that DSM can be used to support parallel programs. Their solution requires quite a bit of effort from the programmer- writing tailored parallel programs which obey perfect locking of shared pages. If everything is done correctly, still it seems to work, which seems fair enough. One thing that was missing is the lack of discussion about what happens when a processor/communication channel fails.

Posted by: Chitra | April 6, 2010 01:56 PM

Summary:
This paper describes how to build a distributed shared memory platform(TreadMarks) on top of a network of workstations to ease parallel programming.

Problem Description:
Conventional wisdom is that it is easier to write shared memory parallel programming than message passing program. When we have lots of cheap workstations, how can we connect all these machines up and present programmers a shared memory interface for them to use? Naively, people could enforce a consistent view of global memory, but the cost would be too high.

Contribution:
(1) In order to reduce the message traffic and improve performance, TreadMarks adopts release consistency model. That is the update is not simultaneously available to all the processors but only to the one that is going to use the data through some synchronization method. It is based on the assumption that this strategy would always work when there is no data race in the program(racy access to shared data should always be protected by synchronization) and data races are always harmful. It is a little debatable though. A while flag is an intentional data race and a very common way for shared memory programmer to use. It will cause problems on this platform. But generally speaking, I think it is still a smart choice and a good assumption to make for scientific computing code where absolute correctness is required(therefore data race free is desired).

(2) In order to reduce the message traffic, TreadMarks uses "diff" to encode what has changed on a page and move this encoding between processors for them to update dirty pages. DSM, the shared granularity is page but not cache-line, so the false positive problem could be very severe, and "diff" method can minimize the updating dirty page traffic.

Applicability:
(1) The benchmarks are tested in this paper are all scientific computing code. Specially, to achieve better speedup, it is desired that the computation-to-communication ratio is high. For the two benchmarks the paper showcased, the author claimed to achieve linear speedup. But in the MIPLIB case, it seems to me that the speedup is linear, but the speedup ratio doesn't look like 1. Given their implementation, it seems the synchronization on taskqueue is their bottleneck.(Synchronization of current best solution seems not to be, because their problem formalization shows it is a single objective function problem, therefore the current best solution is only a single number). I guess it would make sense that they could chop up the queue first and every processors works on the local queue while updating the shared current best solution. And when the local queue is running out, it can get more data from neighbors that still have data. I believe it will achieve better speedup. However, such a "work-stealing" algorithm would be a pain to use TreadMark to program(or shared-memory-platform) while I think message passing might be easier to program with.
(2) I am curious that the author claimed that "it is a daunting task for programs with complex data structures and sophisticated parallelization strategies." It is true with shared memory, it is extremely easy to share a complicated data structure, but as long as the system provides some serialization/marshalization method for the data structure(just like in JAVA, a serializable interface), I don't think programmers really need to worry about how to encode/decode the data structure. With regards to the "sophisticated strategy", it seems that all the cases presented in the paper(Jacobi, TSP, MIP,and ILINK) didn't use any sophisticated strategy, and their MPI counterparts wouldn't be daunting to program. My take is that the shared-memory model's advantage is when dealing with embarrassingly parallel program and it is a very easy transition from sequential code to parallel code. But with "sophisticated strategy", it could be a hotbed for bugs. And a major chunk of scientific code is embarrassingly parallel, therefore, DSM is very laudable, if it can scale well.

Posted by: Wei Zhang | April 6, 2010 01:53 PM

Summary
-------
The paper presents the design of an efficient DSM system(TreadMarks) using lazy release consistency and multiple writer protcol. DSM abstraction eases the life of a programmer writing parallel applications compared to the message passing abstraction.

TreadMarks(TMK) provides 'lazy release consistency' compared to the 'sequential consistency' of previous systems. Sequential consistency has drawbacks of requiring a lot of network traffic. It also leads to "ping-pong" effect caused by false sharing using a page-based DSM. DSM systems also have to choose a memory structure. TMK uses page-based mapping instead of an object store kind of a model. It makes DSM a part of the virtual address space. To avoid the effects of false sharing, TMK allows a multiple writer protocol and uses diffs(run lengths) to synchronize the different versions.

Problem Statement
-----------------
Design of an efficient DSM system avoiding the drawbacks of high coherence traffic and false sharing.

Contributions
-------------
1)Reasoning of the advantages of lazy release consistency model(tied to the lock acquisition time).
2)Multiple Writer protocol to avoid false sharing in page-based DSM.

Comments
---------
1) I would have liked to see a little description about the page location service and the locking service in TMK.
2) Also, I believe they should be using some kind of version numbering to identify that the lock acquiring process has a stale page.

I believe DSM abstraction makes it very easy for parallel programmers, but it has the same problem of enforcing a restricted interface on them. Though it makes life easy, advanced system designers may want little more control and may want to tailor the behaviour based on the application.

Posted by: Laxman Visampalli | April 6, 2010 01:52 PM

Problem Description
In a parallel computing environment involving multiple machines, the data is shared among several physically disticnt processors. When each processor gets its own copy of the shared data, maintaining data consistency becomes a major problem. This paper deveops a few ideas towards this goal.

Summary:
DSM implements the required consistency model in software. It runs as a user level library.It provides the physically separated processes an abstraction of shared global memory as a sequnce of bytes rather than an object or tuple. DSM chose a form of relaxed consistency model - release consistency. This model uses the synchronisation mechanisms to regulate access to shared resources. Only the next process to acquire the lock gets the updated copy thereby reducing the overhead in communication required for sequential consistency. It also propoases a multiple writer protocol where each process maintains a writable copy of the page. Synchronising the differences occurs when the processes hit a barrier and they exchange the diffs wrt to the original unmodified copy.

Contributions/Applicability :
1. Using the release consistency and multiple writer protocols to reduce the communication overhead

I am not sure where the locks themselves are maintained. Does each process gets its own local copy and they communicate with each other? Or is there a central component for storing the locks?

Posted by: Neel Kamal M | April 6, 2010 01:18 PM

This paper talks about the use of distributed shared memory (DSM), using software, for distributed computing across a network of workstations. Software based schemes gives a lot of flexibility for organising the memory - page / object / tuple based. Treadmarks is a user-level library that provides the application a view of a large, unified memory. A number of challenges present themselves when physical memory is shared among multiple workstations - maintaining the consistency of data among the multiple copies, granularity at which memory is shared, data race and so on. Treadmarks addresses these problems in an interesting way.

Treadmarks provides the view of a large memory to the application, which is a combination of the memories shared by workstations on a network. Treadmarks uses virtual memory to track accesses to data and handles faults using this. This leads to false sharing issues as the granularity of sharing is large. It uses lazy release consistency against the simple sequential consistency in an attempt to reduce the number of messages being exchanged between machines. This is another example of using a pull-based model over the conventional push-based model. The pages are invalidated / updated only when a processor tries to access the particular page. This approach works because Treadmarks uses explicit locks for synchronisation and to prevent data races. It also tries to address multiple simultaneous writers by using a diff mechanism. This is achieved by each processor maintaining a diff between the page it acquired and the changes it made. The diffs, between the processors in question, are consolidated at the time of use of the page. This considerably reduces the bandwidth used for update propagation and also addresses the issue of false sharing by allowing multiple processors to modify non-overlapping portions of the same page.

The key contribution of treadmarks is the attempt to use shared memory for distributed computing when most systems approach the same using message passing. It tries to show that memory sharing can be done in an efficient way. It also has an interesting diff mechanism to allow multiple-writers. Some points that are not clean to me. How is memory addressed ? How is the mapping between the page and the machines currently holding the pages maintained ? What is the motivation behind treadmarks choosing a byte-based sharing over an object based sharing ? The paper seemed to be an interesting and different read from the papers that we have reviewed so far. It presented a new way of looking at communication in distributed systems.

Posted by: Deepak Ramamurthi | April 6, 2010 12:19 PM

Problem Description:
This paper discusses the principles used by Treadmarks in providing an abstraction of a global shared memory using a the memory of a network of workstations having individual physical memory.

Summary:
Treadmarks is a distributed shared memory system that provides the abstraction of a shared memory as a linear array of bytes and it uses a replicated relaxed memory model called release consistency to avoid excessive communication overhead in the sequential consistency model. Treadmarks provides synchronization APIs to avoid data races in this model. It also uses lazy release consistency model where only next processor accessing a memory location is informed of changes to a memory location, thereby reducing the traffic. Treadmarks uses multiple-writer protocol over single-writer protocol thereby reducing excessive traffic. In this protocol, when multiple processes write to same location, when they arrive at synchronization barrier, they exchange the diffs and consolidate the changes.

Contribution:
1) The lazy release consistency model that reduces the message traffic
2) Using multiple-writer protocol which is hard to implement.
3) The diff mechanism for resolving the conflicts when two processes access locations in same page.

Comments and Applicability:
The DSM technique is applicable for processes requiring large amount of memory. Though the ideas provided in the paper were good, I am not sure if it will be practically applicable to a wide range of systems. I feel that there is probability that the user can mess up with the Treadmarks API and the normal API in which case the issue becomes complex to debug. Also I am not sure if the diff mechanism will successfully resolve the conflict in all cases.

Posted by: Raja Ram Yadhav Ramakrishnan | April 6, 2010 11:20 AM

The paper presents a distributed system middleware that simulates shared memory across all the individual nodes in the network. The paper also explains the reasons for some of the design decisions like a relaxed consistency model and provides some performance results of an initial implementation of the system.

The main goal behind the work presented is to combine the advantages of a shared memory system (primarily the ease of programming in such an environment where lower level communication and message passing need not be handled by application developers) with the performance, scalability and lower costs (greater performance as a result of scalability) of a networked distributed system.

In summary, the paper presents the obvious way of achieving the above goals by overlapping virtual memory on all the nodes. The paper does not talk about requiring address resolution so the goal is not to increase the amount of virtual memory, but only to map the virtual addresses of all the processes part of this system into the same set of logical pages. This is the most general implementation possible allowing multiple pools of such processes (i.e multiple applications - each controlling a pool of identical processes running on different nodes) to coexist. The majority of the paper focuses on efficient ways of implementing this mapping between the virtual address to the data since the data can now simultaneously exist on multiple nodes. Starting from the very stringent sequential consistency models to the lazy release model, the pros and cons are discussed with the former being easy to implement but having very poor performance due to high bandwidth requirements and higher latency for many operations and the latter - enhanced with the use of diffs - being the most efficient in terms of amount of data needed to be exchanged.

The key point that allows the a lazy erlease policy to work is the observation that in programs where this is not sufficient, there is a race condition prevalent and the system would fail even on a traditional shared memory system.

I feel the ideas presented in the paper are very interesting although it is the first time I have heard of this system. I am curious to know if this memory model is being used in some special type of distributed systems like modern cluster-based supercomputers. I would guess that this memory model allows problems that have relatively lower amounts of parallelism inherent in them to be parallelized to an extent. But for applications like MR or many large database queries like joins, this is not very useful.

Posted by: Sanjay Bharadwaj | April 6, 2010 10:52 AM

This paper talks about the distributed shared memory implemented by using the physical memory of machines connected over network.

The distributed shared memory can also be viewed as an additional layer of cache added to the system. The physical memory of the other systems can be treated as a layer beneath the native physical memory. There are many challenges to build this type of system like maintaining the consistency of the data across processors, to decide on the granularity of sharing which plays an important role in the false sharing, to minimize the number of messages exchanged between processors which determines the latency.

Treadmarks is the Distributed Shared Memory system built. It provides a set of APIs using which the developer can make use of the shared memory and to make synchronized access over them. The consistensy model followed by Treadmarks is the lazy release consistency which is like on demand activity. The consistensy maintanence happens during the lock acquisition. Also, this protocol allows multiple writers to make their modification on the same shared page. The protocol maintains consistency by exchanging diffs created among processor. This method greatly helps in avoiding false sharing. The consistency protocol is mainly driven by the synchronization primitives of Treadmarks.

Concenrns/Questions
1. What must be the virtual address of the shared memory in each of the processors ?
2. This is a common problem with a distributed systems. How and where is the lock structure maintained ?
3. How does page fetch happen when a page fault occurs ? Does the system incurring the fault broadcasts its request to all nodes ?

Posted by: Sankaralingam Panneerselvam | April 6, 2010 10:10 AM

Summarization:
This paper describes the design and implementation of Treadmarks, a distributed shared memory system, which provides the access ability for each processor to any data transparently. Treadmark adopts a lazy consistency model, where synchronization in guaranteed only at acquire time. This helps increase performance without imposing too much limitation. The application benchmark shows a promising result for such a system.

Problem Description:
The problem this paper tries to solve is to design and build a distributed shared memory system. Treadmarks is not the first DSM system, but actually the design choice is impressive. It does not conform to the strict sequential consistency model, but rather creates a practical lazy consistency model with high performance. Also, unlike other systems adopting single-writer protocol, Treadmarks creates a multiple-writer protocol reducing unnecessary synchronization and communication. These two aspects make Treadmarks an important research work at that time.

Contributions:
First, the idea of lazy release consistency model is important. Lazy consistency model does not guarantee consistency anytime, but only at synchronization acquire time. This is a clever idea, in that it assumes a “correct” multiprocessor program should and must use synchronization primitives when accessing shared memory, therefore only consistency at these key points are necessary. The lazy consistency cannot handle an incorrectly-written program, but anyway incorrectly-written program will always produce a flaw result, so trading performance in exchange with correctness for “incorrect” program is not necessary at all.

Second, the idea of multiple writer protocol is also an important idea in this paper. Again, it also assumes a “correct” program as its foundation. A correctly-written multiprocessor program will always use synchronization primitives for shared memory. On the contrary, memories that are visited by multiple programs without synchronization primitives are simply not shared, hence could be accessed in arbitrary order. This paper uses binary diff comparison, and these diff files are exchanged between machines. Again, such method works well for well-written programs with high performance.

Real world Application/Questions:
Treadmarks is a distributed shared memory system, therefore, it could be applied to applications where large amount of memory is required. It provides a uniform representation of globally shared memory resource, therefore the programmer does not have to worry about explicit memory replication and distribution. Porting existing program to Treadmarks is easy, so it is also suitable for running existing program with large amount of data. The paper itself proposed two examples (Jacobi iteration and TSP) and two applications (MIP and genetic linkage analysis), showing a promising application in especially scientific calculation realm.

However, I still have several questions regarding Treadmarks. First, I am not an expert in parallel programming, but is there any situation where a correct shared-memory program does not need synchronization primitives? This is a little bit weird, but I do remember the instructor discussing some false-positive examples reported by race detector in CS736. If so, then will these programs have any problem running in Treadmarks? Second, the network latency of a shared memory system is very large: in this paper, it mentions that the minimum time to perform an 8 processor barrier is over 2 seconds. So I am still a little bit doubt about its practical application in the real world.

Posted by: Shengqi Zhu | April 6, 2010 09:49 AM

This paper discusses about TreadMarks, distributed shared memory system model realized on a set of interconnected machines. DSM provides a globally shared virtual memory even though nodes physically do not share any memory. Treadmarks also provides a set of API for shared memory programming consisting of synchronization primitives. The shared memory abstraction is achieved by replicating the data at various nodes and providing a centralized mechanism of memory consistency using locks and barriers. Treadmarks is implemented as a user level library on top of unix. The release consistency model is implemented on Treadmarks is a lazy mechanism, in terms of propagation of request. An invalidation request is sent on update of a particular page which is then fetched on demand.

The contributions of this paper is extension of shared memory model of SMP to suit the system of interconnected independent nodes with no actual shared memory. Various optimizations such as multiple writer protocols are suggested in this paper to improve the performance of system to suit such an architecture.

The key challenge would be the parllelization of code to engage the multiple nodes in the system. Applications with less amount of parallelism in code could not benefit from such a model as Treadmarks, as it requires explicit specification of parallelism in code by the user. The system only provides the synchronization primitives to achieve consistent and determined operations. It is left to the user to derive the parallelism, and in the absence of such parallel code the utilization of nodes would be minimal. The system also does not exploit the advantage of moving the computation closer to the data. Although most of explanation in paper has been sharing of memory, yet such an approach even for memory data could prove beneficial. However the Treadmarks system provides a uniformed view of the address space across processes and hides the message passing implementations from the user programs.

Posted by: Rajiv | April 6, 2010 08:25 AM

The main gist of TreadMarks is that in order to use shared memory in a distributed system, it is necessary to fix the system of synchronization to fit the new paradigm. As such, this paper introduces a system of barriers and exclusive locks to address the distributed memory synchronization problems. This includes lazy release for consistency and allowing for multiple writers in the system. Overall the system allows for efficient implementation of many helpful parallel algorithms.

The focus in presenting this system is the applicability of different algorithms to commodity distributed systems. Allowing someone to run a parallelized version of the travelling salesman problem on a variety of Dell workstations or whatnot seems to be the goal of much of the paper. They do a good job of presenting these applications and where TreadMarks would benefit the problem. Also of use is the presentation of the different problems that exist when switching to the new shared memory paradigm.

Implementation takes up a good portion of the paper, and the interface and details are very interesting for those interested. Showing how to implement different applications adds to the ability of this system to allow for good parallel applications. Clearly showing that regular locks and semaphores are inadequate and implementing new locking techniques is a great addition to this idea. Also at the crux of this paper is the diff creation which seems to efficiently come to a consistent idea of what the system has done. These ideas make up the contributions in this area.

As a paper, I wish there was more here, but I cannot find fault in anything presented. Tying together low cost operation with high efficiency allows for great gains in the distributed arena. While the majority of the paper seems to me to be catering to algorithms which can run easily under the system, the beauty of the design is still there. Applicability is clearly there, though implementing something more complex under TreadMarks may prove to be too much. I don't see the future in this, but I don't deny the viability as well.

Posted by: Jordan Walker | April 6, 2010 08:25 AM

This paper discusses about TreadMarks, distributed shared memory system model realized on a set of interconnected machines. DSM provides a globally shared virtual memory even though nodes physically do not share any memory. Treadmarks also provides a set of API for shared memory programming consisting of synchronization primitives. The shared memory abstraction is achieved by replicating the data at various nodes and providing a centralized mechanism of memory consistency using locks and barriers. Treadmarks is implemented as a user level library on top of unix. The release consistency model is implemented on Treadmarks is a lazy mechanism, in terms of propagation of request. An invalidation request is sent on update of a particular page which is then fetched on demand.

The contributions of this paper is extension of shared memory model of SMP to suit the system of interconnected independent nodes with no actual shared memory. Various optimizations such as multiple writer protocols are suggested in this paper to improve the performance of system to suit such an architecture.

The key challenge would be the parllelization of code to engage the multiple nodes in the system. Applications with less amount of parallelism in code could not benefit from such a model as Treadmarks, as it requires explicit specification of parallelism in code by the user. The system only provides the synchronization primitives to achieve consistent and determined operations. It is left to the user to derive the parallelism, and in the absence of such parallel code the utilization of nodes would be minimal. The system also does not exploit the advantage of moving the computation closer to the data. Although most of explanation in paper has been sharing of memory, yet such an approach even for memory data could prove beneficial. However the Treadmarks system provides a uniformed view of the address space across processes and hides the message passing implementations from the user programs.

Posted by: Anonymous | April 6, 2010 08:25 AM

This paper describes TradeMark, which is a user-level distributed shared memory library to implement a distributed shared memory (DSM) system.

In previous DSM, consistency level and false sharing are main concerns. Strong consistency is too expensive for most of applications. Trademark needs to keep some level of consistency while trying hard to reduce as much unnecessary communications as possible. TradeMark releases the demand of consistency to improve the performance. It uses “diff” to detect the modification of the files and pass the updates around. This works for most of updates. But for some not very common conditions, like one modify one part of a file and the other deletes the same part of the file, this approach doesn't seem to work. So I prefer to keep both versions and let the users to make the final decision.

Release memory consistency model mainly distinguished TradeMark from previous DSM systems. Two synchronization operations are used in TradeMark. Locks ensures only one processor can enter certain critical sections. Barriers is a global synchronization, one process needs to wait for all other processes to acquire the resources. If two processors access different variables which are located in the same page, many unnecessary communications generated by this “false sharing” is extra cost beyond real need. TradeMark handles this by using “diff”. At the release, dirty pages are compared with previously saved clean pages (twin pages). TradeMark uses “diff” to detect the changes made in copy from original page and then passes the updates around. This model allows multiple writes concurrently. TradeMark also works on handling the merge of all the modifications have been made. This works well for most of conditions and improves performance a lot especially under light racing conditions compared to previous models with higher consistency levels.

TradeMark sacrifices the consistency for performance. Its multiple-writer model improves the concurrency degree. It uses locks and barriers to handle racing of data between different processes well. I prefers share nothing systems to shared memory systems. However, I still like the ideas of this paper. It finds problems of DSM systems and tries to get better performance with guarantee of the enough consistency level.

Posted by: Lizhu | April 6, 2010 07:54 AM

Summary:
TreadMarks is a user-level library that is to share memory in distributed system environment. It relies on the Lazy Release Consistency model to provide synchronization for user to avoid data race. It also uses the mutiple-writer protocols to reduce the traffic cost.

Problem:
In distributed system environment memory is distributed across all processors. Shared memory enables processes to use memory as a global resource and share data much easier. TreadMarks implements this shared memory method in a network of workstations and provide an abstraction of a globally shared memory, in which each processor can access any data without programmer’s special design.

Contributions:
TreadMark uses lazy release consistency model to notify the update to other processors. This could significantly reduce the communication traffic and provide sequential consistency.
TreadMark uses Multiple writers with page diff, which enables different processors to have its own writable copy of page and reconcile differences at synchronization

Questions/Comments:
I am just wondering whether there is a significant performance loss if there are some hot pages that are frequently accessed and updated by different processors.
I think memory-shared is an efficient method to share data in distributed system environments. Can we discuss more about other methods (not just in this paper) for the memory coherence and compare their differences?

Posted by: aoma | April 6, 2010 06:41 AM

Summary:
Treadmarks argues that the communication in a distributed environment should be abstracted out from the developer. It suggests that the developer should focus on application logic rather than the underlying architecture and message passing. The advantage of this approach is that the most data structures in sequential processing can be retained, simply adding synchronization achieves efficient programs. Typically, DSM is implemented using data replication and this can lead to lot of network traffic. To overcome these implementation challenges, Treadmarks proposes innovative techniques. First technique is to use lazy release consistency instead of eager consistency. When a node changes a page and releases lock on that page, Eager consistency invalidates data on all the other nodes. Lazy consistency avoids this and invalidates data only if it is trying to access data modified by another node. This saves a lot of network traffic. The second technique is support for multiple writers because single writer systems are inefficient largely because false sharing and unnecessary invalidations. When two processes modify the same page then synchronization is achieved by comparing diffs instead of overwriting others copy of page. The system is evaluated on parallelized programs and speed up is observed to be dependent on communication-to-computation ratio.

Contributions:
Lazy release consistency: Invalidate data on other replicas only when required and decreasing the network traffic.
Using diffs instead of actual page contents for synchronization and support for multiple writers protocol.

Applicability:
The optimizations presented in paper definitely improve performance of all systems based on DSM. However, I do not agree that DSM is a solution for all types of problems. It may be useful for maintaining complex data structures and from re-usability perspective, but abstracting network communication from developer is not flexible enough and handling communication and synchronization at application level could give better performance. Also, partition and fault tolerance is not described in the paper. What happens if a node tries to acquire a write lock and finds that another node, which is not reachable now, has latest copy? How does the node trying to acquire lock come to know that another node has recently written to that page, is this information also stored in shared memory and is accessed at kernel level?

Posted by: Satish Kotha | April 6, 2010 05:53 AM

Summary:
Treadmarks is a user-level distributed shared memory library. They provide synchronization APIs to the user and aim to transparently implement distributed shared memory, thus making it easy to use.

Problem:
Given a network of workstations, is it possible to implement a distributed shared memory system on top of them, which is easy to use and easily portable?

Contributions:
The paper's main contribution is selling the idea of distributed shared memory. It explains the concept very well, and why it would be good to give DSM a try.

Other contributions include the nice diff scheme, and the concept of lazy release consistency. If this was the first DSM system, it was reasonably thought out.

Application:
The paper gives us a nice view of what kind of problem run well on distributed shared memory systems, what contributes to a good scale-up and where the time is usually spent. Even if their entire system is out of date in today's world, I would hold that these observations are still applicable and make the paper a useful read.

Comments/Questions:
Given that any user level library will suffer a performance hit when compared to the kernel level version of it, it would have been interesting to see how it compares against a kernel level distributed shared memory scheme.

They mention a comparison with the message passing version of the program, and yet offer no hard performance numbers in relation with it.

I wonder how bringing TCP/IP into the equation will mess things up. I think latency will shoot up by at least 3 times.

Overall, even though they left out a lot of details, I like this paper. I am gradually awakening to the fact that research papers are more marketing than anything else, and if I had never heard of distributed shared memory before, this paper would have done a good job of explaining what it is and why I should have given it a try.

Posted by: Vijay Chidambaram Velayudhan Pillai | April 6, 2010 05:04 AM

Problem :-
Implement a distributed shared memory over a network of workstations while keeping in mind, the consistency issues and communication overhead in designing a shared memory over a collection of separate workstations.

Summary :-
The paper presents Threadmarks, a portable global distributed shared memory (DSM) abstraction implemented over a network of workstations, each having individual physical memory. The system leverages the advances in network and processor capabilities to provide a powerful platform for the execution of parallel programs. Threadmarks uses the Lazy Release Consistency model to provide sufficient synchronization to the programmer to be able to prevent data race conditions in the code. It uses Multiple-writer protocols with diffs to reduce the bandwidth consumption.

Contributions :-
The paper describes a system that provides support for running parallel programs across multiple workstations by providing a global shared memory abstraction on a network of workstations. It describes two synchronization primitives (barriers and locks) to enable programmers to prevent race conditions in the program while using ThreadMarks. The paper chooses Lazy Release consistency instead of Sequential consistency or Eager Release Consistency as it provides the desired amount of synchronization while having less communication overhead compared to the other choices. ThreadMarks uses diffs to implement the multiple writers protocol which mitigates the false sharing problem and makes communication much more efficient as the entire page need not be sent. It uses application level protocols to ensure reliable message delivery.

Applicability to Real Systems :-

ThreadMarks can be implemented on existing LANs with minimal infrastructure costs leading to easier adoption. It can be used to efficiently run programs that are inherently parallel with large memory requirements and some memory dependencies (that can be solved by using barriers). Given that ThreadMarks is a distributed system, there are associated issues such as dealing with process failures, network partitions/failures. Such issues need to be discussed while building a real system. I am not sure, how comfortable will the programmers be while writing complicated code in this platform as the onus is on the programmer to prevent race conditions.

Posted by: Ashish Patro | April 6, 2010 04:48 AM

The paper introduces TreadMarks, a system with Distributed Shared Memory over networked machines. The motivation is that with fast networks, communication is less costly, and there may already be machines around that could run DSM applications without needing to purchase additional hardware -- for example, in the case of temporarily idle machines, as in Condor.

The problem that the the paper deals with is how to keep the memory consistent across the different machines. The paper points out that it is only at synchronization operations that there will be a problem; otherwise, if multiple machines are writing to the same location, there is a race and the results are not defined in any case. TreadMarks uses a relaxed consistency model rather than sequential consistency. When a read occurs, the page is replicated among the processors, allowing read-only data to be shared.

This paper contributes the idea of lazy release consistency. In this case, when a lock is acquired, if the page was modified, the system also sends an invalidate of the page to the process acquiring the lock. This is in contrast to eager release consistency, where when a lock is released and the page was modified, all other processes are notified. The advantage of lazy release consistency is that there is less communication, and since any process writing to a shared location should acquire the lock to avoid data races anyway, it is not necessary to invalidate the copies cached by other processes. TreadMarks also has the concept of a multiple-writer protocol, in contrast to the usual multiple-reader single-writer protocols. The advantage of the multiple-writer protocol is that it reduces the effects of false sharing. In the case where two processes really are modifying the same location rather than different locations in the same page, the paper argues that there should have been a lock or other form of synchronization, so the timing-dependent outcome is okay -- it would be timing dependent anyway.

The authors tested TreadMarks on several different problems, but they did not mention as many details as they could have. They also imply that the system would work well on temporarily idle machines, but they do not give any details about how machines joining and leaving would be handled (or for that matter, how the system would be made reliable and fault-tolerant). It would be interesting to hear how they expected to be able to handle it.

Posted by: Lena Olson | April 6, 2010 04:30 AM

Summary
The authors describe a user level system, TreadMarks, for implementing distributed shared memory across a network of standard workstations. TreadMarks exports a simple, user level API to allocate and synchronize shared memory across multiple workstations.

Problem
A network of workstations can provide a multiprocessor environment with performance rivaling supercomputers to speed the execution for many applications. Unfortunately, explicitly programming message passing or global communication detracts from the more important problem of developing parallel algorithms.

Contributions
TreadMarks abstracts away the inter workstation communication and provides a user-level API for allocating and synchronizing distributed shared memory. It uses a lazy release consistency model, which provides sequential consistency in data-race free executions (determined by the supplied TreadMarks synchronization calls). Multiple writers are supported with page diffs, which allow different processors to update different portions of a page at the same time and reconcile differences at synchronization.

Applicability
Overall, I like the idea. It reminds me of an MPI execution environment with pthreads-like synchronization primitives instead of message passing. Given that this is a distributed system, there doesn’t seem to be any treatment of faults. What happens when a node goes down? The API appears to block indefinitely on barrier/lock acquires if a node or link fails (unless the TreadMarks system kills all executions on any failure?).

Posted by: Jesse Benson | April 6, 2010 04:08 AM

Summary:
This paper introduces TradeMark, which provides a shared memory abstraction through a user-level library on top of Unix. The main purpose of TradeMark is to provide an efficient way of synchronization on top of shared memory system.

Problem:
There are many problems with the consistency model of previous DSM:
- Unnecessary communication cost: Given that sending a message is expensive which involves into OS kernel, it is worth to eliminate the unnecessary invalidation notification among processors. In another words, when updating a variable, it suffices to send an invalidation only to the next processor acquiring the lock instead of broadcasting it to all processors.
- False sharing: Since the pages are large, it is very likely that multiple processors are sharing the same page. The write actions from different processors will cause the “ping-pong effect”

Contribution:
- Lazy Release Consistency: TreadMark uses release consistency model to notify the update to other processors in the system. This is kind of delay update notification until it is truly necessary . Since the other processor will not access the data until the synchronization operations has been executed.
- Multiple-Writer Protocols: Multiple-write protocol is introduced for solving the problem of message traffic and false sharing with single-write protocol. It creates a local twin copy when write, compares word-by-word and creates a diff at barrier. Pages are updated exclusively by applying diffs. No complete new copies are needed.

Comments:
TreadMark leverages a very efficient solution for data consistency, while there is no discussion about the fault-tolerance and reliability of TreadMark. Since they use UDP/IP for communication, how do they detect the failure of a remote process. What if the remote process dies with holding a lock? How does it recover from a failure.

Posted by: Deng Liu | April 6, 2010 03:47 AM

Problem addressed:
=============
This paper discusses an implementation of Distributed Shared Memory (DSM) over a network of workstations. It also talked about different design trade-offs of these systems.

Summary:
=======
DSM provides processes with a globally shared view of memory implemented on top physically separate memories of many nodes. A s/w implementation of DSM, allows programmers with the ease of programming with shared memory model while avoiding complexity of implementing shared memory system in the hardware. This allows one to logically build a large shared memory system on top of commodity processors and thus likely to be cost effective but with the performance penalty of managing shared memory in s/w. The s/w based distributed shared memories are primarily implemented using virtual memory page protection to give user a globally consistent view of the shared memory. There are two major design issues in implementing DSM --- level of consistency and problem of false sharing. Providing strong consistency guarantees (e.g sequential consistency) can be pretty expensive in s/w-based DSM as the consistency of shared memory is maintained in s/w through OS virtual memory page protection which may result in frequent traps to kernel. For better performance, DSM systems generally implement weaker consistency models like release consistency models, which makes communication among processors much less common but puts burden on the programmers to make sure their programs are data-race free. To overcome the problem of false sharing due to large page size at which the system is kept coherent, the system proposes to use "diff"s which is essentially bears the flavor of update based protocols, albeit with the difference that only data that is modified is passed around.

Short summary:
===========
This paper discusses different aspect of implementing s/w based Distributed shared memory on top of a network of workstations.

Relevance/Concerns:
==============
I have couple of concerns about the claims in the paper. For example, it claims that DSM provides same programming environment as hardware shared memory multiprocessors, but it uses implementation specific library calls, e.g. Tmk_malloc() to allocate shared memory. So user needs to distinguish between shared and private memory allocation unlike hardware shared memory systems. Secondly, by implementing lazy release consistency model with multiple-writers, it assumes that all the programs will be data-race free, which complicates programming. Moreover, programmer may not care about data-races in certain circumstances like in statistics gathering. Given the performance overhead of DSM and its pretty weak memory consistency semantics, I have doubts about wide range applicability of DSM machines.

Posted by: Arkaprava Basu | April 6, 2010 03:01 AM

Summary:
TreadMarks is a software library implementing distributed shared memory, an abstraction that tries to maintain most of the useful features of local shared memory while allowing operation across multiple networked machines with no special hardware support. TreadMarks is designed to support any binary capable of linking against a C library.

Contributions:
The authors' goal in implementing TreadMarks is to provide a simpler distributed programming model when compared to standard message passing, the usual method of communicating among loosely coupled machines. It appears that this is by no means the first implementation of software shared memory, but it does have some unique features.

To combat the performance losses associated with sharing at a granularity of pages, the authors use a combination of relaxed consistency and diff-based compression. Full pages are never transfered (disregarding the trivial case of all bytes modified), but rather diffs between the last known state and the current state are transfered.

One might ask, how are conflicts resolved. In short, they are ignored. The authors use a technique common in optimizing compilers, specifically they do not define behavior for incorrect or erroneous programs. In this case, the release memory consistency model takes advantage of the fact that order among concurrent reads and writes is only enforced by explicit synchronization. If a conflict occurs when diffs are sent, program behavior on a real shared memory machine would also be defined. Therefore a simple last writer wins policy implemented at page level suffices for programs with no data races. Multiple writers are allowed, but in a sensible program, they had better synchronize their actions.

Questions/Comments:
The similarities with STM are striking, but implementing transactions would require much stronger ordering and communication than currently exists.

It would also be nice to hear about reliability. Since DSM is latency sensitive and likely to be used among closely coupled machines, I will accept that node failures can probably be handled at a higher level of software, but I would like to know more about recovery from transient network errors (dropped/corrupted packets).

Posted by: Joel Scherpelz | April 6, 2010 01:34 AM

Threadmarks distributed shared memory system is a shared memory mechanism for commodity machines on a network. The primary difference between Threadmarks and prior systems is Threadmarks's use of a relaxed memory model – release consistency.

RPC protocols are often complicated orchestrate when implementing complicated parallel algorithms. Threadmarks caters to application developers because it abstracts away the details of interprocessor communication.

The contributions of this work include:
1)Use of a optimistic memory concurrency model – release consistency.
2)An API for C, C++, and Fortran that is comparable with standard systems of the day. (Doesn't require modifications to the hardware, kernel, or compiler). This system runs in user mode
3)A writing process performs a copy, writes to the copy, then performs a diff with the original. The diffs are then transferred across the network.
4)The multiple writer protocol allows multiple concurrent writers and then attempts to merge the diffs.
5)Two locking mechanisms are used to synchronize processes: locks and barriers.

This system appears to be more efficient than its predicator (IVY) due to the multiple-writer protocol. Threadmarks is more conservative in how much data is transferred between machines. However, since it is an optimistically concurrent system, there are still opportunities for data races. A good developer would know how to use the locks and the barriers to protect the data from data races. However, this is pushing the consistency of the system into the hands of the developer.

Posted by: Sean Crosby | April 5, 2010 11:43 PM

TreadMarks

Problem

The goal of TreadMarks is to provide distributed shared memory library over networks of commodity workstations. DSM inherently involves communication between nodes. Minimizing communication between nodes and maximizing parallel execution is a key to higher performance.

Summary

The paper provides page-level granularity shared memory system that has far greater granularity than hardware shared memory. Thus, cost of false sharing is far greater. To solve this problem, the paper introduces release consistency model that memory is consistent within locks. Within release consistency model, it allows multiple-writer to same page. The trick is that it creates a twin page when user application tries to write. Thus, TreadMarks can keep track of what has been changed. Then, when there are multiple-writers to same page, it compares these pages with another writer’s page word-by-word. If there is no data race, word-by-word comparison will successfully merge several versions of the page into single version.

Contribution/Discussion

The idea of this paper resembles Software Transactional Memory. While STM works in single machine and has word-level granularity, TreadMarks works in multiple machines with page-level granularity. Having a twin page is identical to STM’s write-log. However, a key difference is that TreadsMark creates a twin page for resolving false sharing while STM creates a write-log for conflict resolution.

The paper lacks discussion about how DSM can co-operate with Hardware Shared Memory. Considering each processor in single machine as a separate DSM node will surely make the system significantly slower.

Application

TreadsMark could be used in distributed system where computation-to-communication ratio is high. Most branch-and-bound problem could be solved easily. Also, it has more flexibility over map-reduce style programming. Map-reduce lacks freedom of configuring pipeline, only static data flow while pure branch-and-bound program can have any kind of data flow.

DSM could be extended to Distributed STM and it would not incur any significant performance penalty. The argument is similar to that existing cache-coherency protocol in SMP system can be used to implement hardware STM. Release level consistency and twin pages are just like cache-coherency protocol in SMP system.

Posted by: MinJae Hwang | April 5, 2010 11:34 PM

Summary:
TreadMarks is an implementation of Distributed Shared memory. The two main features of TreadMarks are 1) lazy release consistency and 2) multiple-writers. The authors use two simple examples throughout the paper to demonstrate the properties of the system.

Problem:
Shared memory allows processes to view memory as a globally shared resource. In the context of distributed systems, memory is distributed across processors. Systems like TreadMarks allow distributed shared memory to still be viewed as a single shared resource, thus hiding the complexity of accessing a physically distributed memory.

Contributions:
TreadMarks implements lazy release consistency and enables multiple-writers by using diffs to detect conflicts. Both of these features significantly boost performance compared to traditional sequential consistency protocol at the page-level granularity.

Comments:
I think that this paper really didn't impress me very much because it is something which is taken for granted. I suppose in 1996 it may have been novel. I am also skeptical about the complexity of their implementation. It may seem that a more relaxed consistency model is easier to implement, but in fact frequently it is much harder. Finally, I am not sure that it's quite possible or even feasible to write race-free programs, which makes release consistency even less viable.

Posted by: Polina | April 5, 2010 11:26 PM

Summary:
This paper gives an overview of Treadmarks, a distributed shared memory (DSM) system for networks of workstations. The functionality of Treadmarks is demonstrated using two simple problems, Jacobi Iteration and the Traveling Salesman Problem. The paper then describes the challenges of implementing Treadmarks, the consistency model used, and how Treadmarks allows multiple writers to shared pages. Finally, the authors describe their experiences with Treadmarks running two large, recently implemented applications,mixed integer programming and genetic linkage analysis.

Problem:
As network speeds and processor performance improve, it becomes increasingly viable to run parallel programs on networks of workstations, rather than just on supercomputers and multiprocessor machines. Treadmarks is the implementation of an efficient DSM system that uses a relaxed consistency model and allows multiple programs to update shared pages simultaneously. Treadmarks relies on the programmer to ensure that their program is data-race free; in the presence of data races, Treadmarks allows conflicting updates to occur, which potentially create inconsistencies.

Contributions:
The paper provides a compelling argument for DSM systems with relaxed consistency semantics. Many important optimizations fall out of the lazy release consistency model used by Treadmarks, including drastically reduced network traffic, the ability to allow multiple writers to a page, and no kernel, compiler, or language changes.

Applicability:
The applicability of DSM systems is still under debate. DSM makes sense when network latencies are low, data structures span multiple machines, and synchronization must be frequent. However, modern networks are still slow relative to CPU speeds (although this may be changing), and rarely must data structures span multiple machines machines. In general synchronization and shared data accesses are also discouraged in a distributed environment for performance, correctness, and dependability reasons. DSM relinquishes some control of locality from the programmer, and would only encourage shared data accesses, which may be a bad thing. The future of DSM is undecided.

Posted by: Marc de Kruijf | April 5, 2010 08:54 PM

TREADMARKS: Shared Memory Computing on Networks of Workstation
==============================================================

- Summary

This paper discussed about the design and implementation of TreadMarks, a shared memory computing on networks of workstation, with lazy release consistency model and multi-writer protocol to reduce the communication overhead and increase performance.

- Problem

Building a shared memory system using a network of workstations is not a trivial problem. The distributed nature of the system make the problem hard. Since each work station has it own local memory and only communicating with each other via network communication, constructing a virtual globally - shared memory faces the major challenge of maintaining consistency. The normal implementation approach is to replicate data. This provides performance gain, but again suffers consistency problem.

- Contribution

In my opinion, there are three majors contribution of this paper.
+ First contribution is *the synchronization API* (i.e, those with locking and synchronization barrier). These APIs make programming parallel program become easier, which is illustrated by Jacobi and TSP problem.
+ Second contribution is the *lazy release consistency*. The basic idea is to enforce consistency at acquire time, which helps reduce a lot of unnecessary messages, which in turn, improves performance in comparison to traditional sequential consistency model.
+ Final contribution of his paper is the *multiple-writer protocol*. In traditional approach, single-writer protocol is used. This approach is simple but suffers from expensive message traffic and false sharing. In multiple-writer protocol, multiple processes are allowed to have writable copies of the same page. Virtual memory hardware is use to detect modification. Resolving the difference among modification is done using diff. This dramatically reduce the communication overhead during sync phase.

- Flaw/Comment/Question

The weakness of TreadMarks is that it does not solve the problem of data race that happen when multiple processes modify the same part of the pages. It leave the task of enforcing no data race to programmer, which to me seems unreasonable.

Posted by: Thanh Do | April 5, 2010 04:16 PM

Goal:
ThreadMarks is a DSM system that improves DSM performance by using relaxed memory consistency to reduce the amount of communication traffic. Additionally, it reduces false sharing by transfer only changes within a memory page.

Problem:
DSM allows network of workstations to act like a large shared memory system. However, a system has to reduce the overhead of communication by minimizing the traffic or delay. Additionally, it also needs to implement synchronization primitives efficiently.

Contribution:
A simple DSM simple will use a protocol similar to cache coherent protocol found in multi-processors system to manage memory among workstations. However, this method generates a lot of unnecessary traffic due to false sharing and strict sequential consistency.

The main idea behind ThreadMarks is the assumption that valid parallel program should protect shared memory access by using synchronization primitives. Thus, memory invalidation can be delayed until synchronization is encountered. Thus, lazy release consistency only broadcast invalidation message after barrier. Additionally, “lazy” refers to the fact that update is obtained on-demand in order to minimize traffic because the system do not need to update the memory on the process that does not require it.

To minimize false sharing when process access data on located in the same page, ThreadMarks create a clone of shared page and create a diff. During synchronization, only the diff is transferred between cooperating processes.

Applications:
DSM provides a platform where sequential programs can be ported easily into parallel applications that run on clusters or networked machines. Especially with the current trend of multi-core systems, multi-threaded applications can be ported with minimum effort onto clusters because the programs are already parallelized.

Flaws:
There is another approach that might reduces the overhead of false sharing by forcing the programmer to treat memory of objects instead of blocks. In this case, the system will know which data should be shared or not and transfer only memory within the boundary of object.

Additionally, the paper mentioned that idle time is the main overhead of current system. This is because it uses only lazy sharing which means that the data are transferred only when it is needed and a process must wait for the communication delay. However, if the system interleaves communication with computation by transferring shared data during computation, idle time might reduce. This may improve overall performance at the cost of more communication overhead because some data might be transferred unnecessary.

Posted by: Thawan Kooburat | April 5, 2010 03:20 AM

Treadmarks is an improvement over previous processor pool systems in its ease of use. One difficulty with Condor is that a process only runs on a single machine, and developers must write the job to use some MPI or else dispatch jobs and hope they get run in a timely manner. (It's notable that this paper was released the same year as the first draft of 'MPI'.) Treademarks allows developers to write the process much as they would with standard threading libraries, and memory is passed around and update transparently. I have little experience with MPIs, but I would imagine porting a threaded, single-machine implementation to Treademarks would be much simpler than porting to an MPI library.

It isn't totally transparent, though. As their experience with genetic linking showed, the communication overhead can be too large if the problem sizes are too small. As you'd expect, the system works much better when you write programs *for* it: make problem sizes large, and do partitioned computation over a set of data.

The system shows to be scalable up to at least eight nodes. For both sets of problems, it is ~88% efficient. They should have presented their results in terms of efficiency; there's supposed to be linear speedup, so that's nothing to brag about. This isn't enough to conclude that the communication is not prohibitive up to hundreds of machines, and this would be important if computations over a single set of data were performed.

I find it odd that the only two synchronization primitives are locks and barriers. It seems that locks are far less useful without condition variables, and an odd omission since they were created by Hoare in 1974 or so. Read-write locks may be difficult to implement in a distributed manner, explaining their exclusion. Barriers can easily be simulated with a condition variable and mutex, but perhaps they were easier to implement than condition variables. Some rationale would have been good to see. Perhaps it is meant to only fit two common models: producer-consumer (thus locks) and partitioned computation (thus barriers).

Treadmarks ignores both security and fault tolerance. Ignoring security is fine, since it just means you need a trusted network. It is irrational to presume that all systems in the Treadmarks network will remain up and continuously available, though. Treadmarks seems to share all of the reliability problems of token-ring networks, and yet the authors do not mention this. Who really keeps track of the locks or how many threads have hit a barrier?

(Who scanned this and thought: 'looks good to me'?)

Posted by: Markus Peloquin | April 3, 2010 08:55 AM

CS 739 Review Blog

TreadMarks: Shared Memory Computing on Networks of Workstations

Comments

Post a comment