Paper Discussion - Fall 2008 - CS 736: TxLinux: Using and Managing Transactional Memory in an Operating System

« Experiences with Processes and Monitors in Mesa | Main | Resource Containers: A New Facility for Resource Management in Server Systems »

TxLinux: Using and Managing Transactional Memory in an Operating System

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, Emmett Witchel. TxLinux: Using and Managing Transactional Memory in an Operating System. in Proc. of the 21st Symposium on Operating Systems Principles (SOSP), Oct. 2007.

Reviews due for this or Mesa Monitors on Tuesday, 10/7.

Posted by Michael Swift on September 29, 2008 12:18 PM | Permalink

Comments

Summary: TxLinux is a Linux variant that uses HTM (MetaTM) for synchronization; it also uses HTM feedback to mitigate priority scheduling anomalies. TxLinux’ main TM interface is the versatile cooperative transactional spinlock (cxspinlock) allowing transactions to be restarted non-transactionally with locks.

Problems: TM is an attractive alternative for programming systems with an increasing number of CMPs. However, in addition to existing OS issues (such as priority inversion), TM brings its new set of issues: handling (irreversible) I/O in transactions; issues when using both locks and transactions; and transitioning path from non-TM.

Contributions: The paper provides two mechanisms that greatly contribute to solving the above problems:
1)Use of TM status information provided by the HW, so that OS has better heuristics and implement cxspinlocks.
2)Introduction of cxspinlocks as a mechanism which dynamically selects the use of transactions, locks, or restarting a transaction with locks. Moreover, cxspinlocks can coexist with many non-transactional locks.
They also propose MetaTM, a TM implementation that cleanly exemplify their primitives; and they offer TxLinux as a proof of concepts for their methods.

Implementation: In general, TM provides HW support to execute transactions atomically and to revert the effects of failed transactions. MetaTM uses a flat nesting TM (nested transactions are flattened), with eager conflict detection (early restarting of conflicting transactions) and version management (optimistically updating of memory). Xbegin, the start of a transaction, returns the transaction status word which can be set by Xrestart and tells the reason for (re)starting. Xgettxid can probe if in a transaction. Such primitives are used to implement Cxspinlocks whichs are kernel routines mostly intended to replace spinlocks, but more general than just that. Cxspinlocks are crafted so that they can be executed by threads that are transactional as well as coexisting non-transactional threads. A transactional thread that enters in a Cxspinlock proceedes transactionally, concurrently with other non-conflicting transactional threads; an I/O operation makes it back off and re-execute exclusively the Cxspinlocks as non-transactional thread, using a lock argument. A contention manager (the HW/SW TM part deciding who restarts in case of conflict) favors non-TM threads.

Easy: By allowing coexistence of different paradigms, Cxspinlocks eases the transition to TM. It is easy to alleviate priority inversion by having the contention manager use process priority and TM statistics (passed in os_prio) to favor higher priority processes.

Hard: The flat nesting nature of MetaTM introduces the possibility of new deadlocks that need to be well thought off. Code that passes ownership of locks from one thread to another cannot be easily converted to TM. Cxspinlocks should be the only interface to TM, but this is unrealistic yet.

Weakness: 1) The nature of the newly introduced class of errors can be very subtle.
2) Cxspinlock don’t coexist well with transactions not using Cxspinlocks
3)Kernel and User mode transactions don’t coexist well. It is unacceptable to kill a transacting user process that eventually requires I/O.
4)TM work well if there are few contentions; however the number of contentions may be architecture dependent and there may be performance-portability issues.
5)HW for TM in general may consume more power, cost more and be more complex than in non-TM systems.

Posted by: Daniel Luchaup | October 9, 2008 09:38 AM

Summary
This paper gives a model of hardware transactional memory as a simple, well performing alternative to locks. It introduces a cooperative transactional spinlock that allows the system to dynamically choose between locks and transactions as necessary.

Problem
Given that multi-core processors are now widely used, how do we scale the performance of synchronization without introducing significant complexity?

Techniques/Contributions
Transactional memory has been used to simplify reasoning about concurrency in user-level programs. The authors wish to obtain performance benefits from the use of transactions in the kernel without incurring additional complexity. The authors present MetaTM, a model of hardware transactional memory. The model allows a critical section to proceed but restart execution if there is a contention or if an I/O operation is attempted. Notably, upon beginning a transaction a transaction status word is returned indicating whether it is the first execution of a transaction or if it has restarted and may indicate that the section needs to use mutual exclusion. A compare and swap instruction is provided that allows threads using exclusion to be subjected to contention manager policy, reducing types of priority/policy inversion. The cooperative transactional spinlock (cxspinlock) provides an exclusive lock for non-transactional threads that uses MetaTM’s compare and set instruction and an optimistic method for transactional threads that attempts to use a transaction but may revert to locking if code in the critical section requires exclusion. The cxspinlock was found to be a safe replacement for most locks in the Linux kernel, but a significant amount of work was involved in finding such locks in order to retrofit the kernel. The authors also modify the scheduler to take into account various information of transactions (existence, size, restarts, …) to use dynamic priorities and descheduling of threads likely to be restarted due to conflict. A tradeoff is made in that the authors are willing to deal with wasted time due to restarts in order to decrease the amount of time wasted spinning.

Flaws
There was an odd statement that opportunity to improve performance with synchronization primitives is limited at the scale of 32 cores since the kernel spends less than 12% of its time synchronizing. Although the performance benchmarks seem to suggest some issues with scalability, they also seem to suggest that even less time is spent synchronizing with 16 cores.

Posted by: Sam Javner | October 9, 2008 08:03 AM

Summary

The paper desribes TXLinux, an OS that uses Hardware Transactional Memory (HTM) as a synchronization primitive, and measures its performance. The main focus of the paper is two-fold. One is to describe in detail the cooperation between locks and transactions that achieves the benefits of both the approaches and the other is the integration of transactions with OS scheduler.

Problem attempted

The objective of the paper is to demonstrate the use of transactional memory to reduce the complexity of parallel programming without losing the benefits offered by locks.

Contributions

1. The paper observes that synchronization is one of the chief sources of bugs in the Linux code. Locks are non-modular and demand a lot of programmer reasoning and thinking while implementing them. Transactions are more modular and also improve performance by allowing multiple non-interfering threads to concurrently execute in a critical section. Besides, large critical regions bring down kernel response latency if they are protected by locks. But large critical regions are easier to reason about as they reduce the number of program states.

2. The paper proposes MetaTM as its HTM model. Conflict detection in MetaTM is such that the first memory access that causes a conflict also causes one of the transactions to restart. The prominent instructions added to the ISA are xbegin (to begin transaction), xend (to end transaction), xrestart, xpush ( save transaction state), xpop (restore previously saved transaction state)

3. The main draw-back of HTM is that they can only roll-back processor states and contents of physical memory and not I/O operations. The paper proposes to overcome this problem by proposing cooperative transactional spin-locks (Cxspinlocks) that allow different executions of a single critical section to be synchronized either with locks or with transactions.

4. Cxspinlocks are acquired using two functions: cx_exclusive and cx_optimistic. Non-transactional threads call cx_exclusive function to exclude other fucntions other threads from entering critical section. Any active transaction is made exclusive. After this, the cx_exclusive function uses xcas instruction to obtain the spinlock without being unfair to a transactional thread i.e. the contention manager decides which thread succeeds. Transactional threads call the cx_optimistic function. This function determines if the transaction has restarted because exclusion was necessary. If so, it calls cx_exclusive. Else, the thread waits for spin-lock to be unlocked and adds it to its read-set so that it is indicated later if it requires mutual exclusion because of a non-transactional thread.

5. Decoupling I/O from system calls is proposed as a method for the kernel to provide transactional programming support to even user-level critical regions that change the state of I/O devices. This is done by the OS freeing sufficient system resources to perform the user-initiated system calls.

Flaws

Flat-nesting of transactions exposes the system to "other-wise avoidable" deadlocks

What it makes easier

1. The use of contention manager in the cx_exclusive function has eliminated the problem of a non-transactional thread unfairly kicking out a transactional thread while acquiring a lock. The contention manager can now set policy for this lock. Besides, contention manager can eliminate priority inversion by resolving a conflict in favor of a thread with a higher priority.

2. The xtest instruction ensures that a locked lock veriable is not added to the read-set of a transaction thus avoiding an unnecessary restart of the transaction while the lock is unlocked.

What it makes harder

The number of instructions and memory references in the cx_exclusive or cx_optimistic function is much higher than in the traditional low-overhead spin-lock function

Posted by: Balasubramanian Sivan | October 9, 2008 07:49 AM

Summary

In TxLinux: Using and Managing Transactional Memory in an Operating System, Rossbach et al. give an overview of approaches to mutual exclusion in TxLinux, a modified Linux (2.6) kernel with support for the MetaTM hardware transactional memory (HTM) model.

The Problem

Writing efficient multi-threaded programs that read and write to a shared resource is often difficult. In theory, transactions can considerably simplify code by decoupling application logic from fine-grained locking on critical sections in a parallel application.

With this in mind, the authors set out to replace as many mutual exclusion primitives within the Linux kernel as possible, an ambitious endeavor due to the high number of I/O-centric routines there.

That said, the authors decide to avoid tackling the issue of rolling back transactions that perform I/O (restarting the transaction is apparently a novel solution) such as flushing data to a HDD or sending a packet to another computer. They also do not attempt to virtualize transactions (handle overflows in transaction log size).

The Solution

MetaTM checks if a thread's writes to memory interfere with reads or writes of another running thread. The hardware rolls back the state of a thread and the memory it has written to if a transaction within it fails. This is an eager process: the moment a write is made to data being used by a concurrent process, the transaction fails.

Transactions can be nested within transactions; however, if the innermost transaction fails all the transactions it is encapsulated by fail (the "flat-nested" model).

When a transaction fails, it is rescheduled for strict mutually-exclusive execution later using a spinlock instead of a transactional approach. This prevents the transaction from being scheduled and rolled back over and over again. The rescheduled code may be assigned a higher priority than before since the kernel scheduler is HTM-aware.

Possible Flaws and Notable Omissions

It is worth noting that TxLinux takes less overall synchronization time than vanilla Linux in 5 of the 6 test applications under 16 cores. That said, the 32 core results are considerably less promising, vanilla Linux winning in 2 out of 4 comparisons, with 2 comparisons missing entirely.

The authors also explain that ~ 2000 lines of code changed in the kernel, and ~ 5500 were added. Though line-count is certainly a crude heuristic for system complexity, one would expect a reduction in the codebase size after implementing transactions.

Trade-offs

Since transactions within MetaTM are eager, both in terms of writing to memory immediately and events that interrupt transactions, the penalty for rolling back (restoring the old memory values) is high compared to a lazy model (where a memo would simply be discarded). In this respect, this system is biased toward expecting low contention for resources between processes. This optimism may be why it appears to scale poorly.

It's also worth nothing that since both transactional and non-transactional threads interact with each other on this system, fairness is difficult to preserve. Any writes a non-transactional thread accomplishes that conflict with a transactional thread will be carried out, requiring the transactional thread to restart.

Posted by: James Jolly | October 9, 2008 05:35 AM

Summary
The paper discusses the use of cooperative transactional spinlocks, that can be called by both Transactional and Non-Transactional Threads, to migrate an Operating System Kernel to use Hardware Transactional Memory (HTM) and thereby achieve higher concurrency. The paper also talks about some innovations to avoid priority and policy inversion during scheduling in most HTMs.

Description of the problem being solved
The problem is to come up with a method of switching from conventional lock primitives to HTMs in order to exploit the concurrency present in huge parallel programs easily ( from development and maintenance perspective ). An obvious sub problem is to solve issues like I/O within transactions that come up while solving main problem.

Contributions of the paper

1) cxspinlock that can act as both transaction primitive and lock primitive depending conflicts in the critical sections that they protect is an innovation that made migration of huge parallel applications easy. For example, the authors says that using cxspinlock reduced development efforts from 5 man years to 2 man months.

2) Handling I/O within Transactions by cxspinlocks automatically restarting the transaction is a nice approach which again eases adoption.

3) Innovative contributions have been made to solve the problems of priority inversion and policy inversion. For e.g., making the contention manager resolve a conflict in favor of a higher OS scheduling priority process to avoid priority inversion.

4) The detailed explanation of the approach and experiences in first ever migration of an entire operating system kernel from traditional locks to HTM is a valuable contribution.

Flaws in the paper

1) Decision to have cxspinlocks act as both locks and transaction primitives has kept the problem of deadlocks from the lock world still alive withing the HTM world.

2) Transaction restarts with cxspinlocks during conflicts has performance overheads.

Techniques used to achieve performance

Use of Transactional Memory to replace the traditional lock based synchronization with Trasactions thereby increasing concurrent optimistic execution of the critical section which increases throughput and latency.

Tradeoff made

A tradeoff is made between ease of programability and performance overhead. Automatic switching from transaction to lock based synchrony eases adoption but implied performance overheads due to frequent restarts in highly contented shared data access patterns.

Another part of OS where this technique could be applied

Transaction Memories (TMs) can be used in Databases and other system software where transactions are natural. However, it must be noted that TMs target concurrency whereas conventional transactions target reliability.

Posted by: Leo Prasath Arulraj | October 9, 2008 05:16 AM

Summary
The paper describes a technique to increase parallelism within the os kernel by using transactions instead of locks. Hardware transactional memory is used to implement transactions, and where transactions turn out infeasible, failover to traditional locks happen. The authors term this technique as cxspinlocks.

Problem addressed
There are two problems which the authors are trying to solve here. First of all, there are the various problems with locks which make locks a bottleneck to parallelism because locks take a pessimistic approach. If there's a chance of overlapping memory accesses etc, a lock is used and exclusive access is provided, even if it need not be a case that this overlap always happens. Secondly, with multiprocessor architectures, the problem is compounded because, a lock essentially means only one processor can execute the code within a critical section. The authors propose using HTM as an optimistic approach, allowing transactions to happen simultaneously but reverting/restarting all but one in case of a conflict.

Contributionsthe key contribution of the paper is the cxspinlock which attempts a transactional model initially, and if that fails, reverts to a conventional lock. This allows one to replace a greater fraction of existing conventional locks with the cxspinlocks, as failover to the traditional locks is possible.
the paper uses the transaction profile to assist in os scheduling. This helps improve performance by scheduling threads which have repeatedly restarted in the past less frequently etc.

FlawsPerformance would tend to suffer with high contention. This is because with high contention the transaction model will restart frequently, and because the whole transaction is restarted, bus contention will also be high.
The paper was very difficult to understand ! Could possibly be because I don't have much of an idea about hardware transactional memory and the HTM primer section of the paper doesn't quite do it's job !

Technique used
Traditionally, locks were used to enforce exclusive access for each thread accessing a critical section. Instead, here transactions are allowed to progress simultaneously, but when they conflict, one of the wins and is allowed to continue where as the others are rolled back and restarted. This way, we can have greater parallelism because more than one transaction can be ongoing, and in the best case of no conflict, all of them (or atleast more than one) succeed improving performance.

Tradeoff
The locking is a pessimistic approach to ensuring atomicity. Even if contention is low, you ensure that you're the only one before you do anything. The TM is an optimistic approach where we hope things work since contention is low, and if it doesn't, we pay a high cost of rollback. Hence the tradeoff here is the amount of parellelism that can be exploited vs. the cost of rollbacks.

Applicability
Transactional memory can be used to improve parallelism where locks are presently used, so long as it is possible to revert a transaction when it fails. And major deciding factors would be things like whether hardware implementation would be possible, allowing higher performance than locks.

Posted by: Varghese Mathew | October 9, 2008 03:04 AM

Summary:
This paper presents TxLinux (a Linux variant), which uses HTM as a synchronization primitive. Apart from providing HTM-aware scheduling and cooperative spin locks, it presents a solution to a long-standing problem of IO in transactions. The TxLinux system eases the conversion from locking primitives to transactions significantly without major performance hit.

Problem:
This paper studies the problem of using HTM as a synchronization primitive in an OS (a variant of Linux) by managing it in the scheduler itself. TxLinux provides cooperative spin locks allowing both locks and transactions to protect data and roll back to locks to perform I/O. Using HTM, solves some key problems such as reducing synchronization bugs (adding more modularity), allowing performance scalability with multiple threads executing concurrently in a critical transaction and eliminating priority-inversion.

Contributions:
One of the major contribution of TxLinux is cooperative synchronization paradigm between transactions and locks (cxspinlocks). Further, cxspinlocks can be called from transactional and non-transactional threads, thereby increasing the parallelism. IO is handled within transactions by reverting to the traditional approach of conventional locks. This is done by roll-backing the executing transaction.

Further, HTM eliminates priority inversion and is free from deadlocks and livelocks. This is achieved by using a contention-manager to handle read-write conflicts.

Transaction-aware scheduling helps to assign dynamic priority to account the impact of transaction state on system througput.

Flaws:
One of the biggest flaws or tradeoffs with using such new hardware paradigms (HTM) is figuring out the right hardware support. This is really a big bottleneck for adopting them and thus for the success of HTM. The authors themselve claim this to be the second most time-consuming part in implementing TxLinux.
Another and perhaps the most important drawback for systems like TxLinux, is with legacy code compatibility. It is important to think about the changes needed to be incorporated to the existing OS, programming languages and even applications such as databases. The authors themselve admit the fact while describing cx_optimistic primitive that no synchronization primitive is so powerful to make high-performance parallel programming easier. On a semi-related note, HTM may not provide significant performance impact for workloads with normal contention profiles (benefits are mainly for read-modify-write).

Techniques:
Major techniques used are HTM, HTM-aware scheduling, cooperative synchronization primitives (cxspinlocks).

- Tradeoffs
The authors explore the trade offs between the two synchronization primitives - locks and transactions. They use a hybrid or cooperative approach for TxLinux. For handling IO, they revert back to the traditional locks and for improving system throughput, they use transactions to allow concurrent threads in the same critical region.

- Alternate uses:
Transactions, in general, are well known for decades, major applications being in database concurrency control. Further TM (particularly software TM) can be used to provide transactional interfaces for persistent storage using non-volatile devices which are getting popular today.

Posted by: Mohit Saxena | October 9, 2008 02:22 AM

In this paper, the authors present TxLinux, an operating system that uses hardware transactional memory as a synchronization primitive. They describe the features of the transactional spinlock , a primitive that allows locks and transactions to work together and then present scheduling techniques that use HTM to improve throughput.

The use of transactional memory makes concurrent programming easier. Regions of code that access shared data are executed atomically and in isolation. The problem is that transactions cannot replace locks in many cases (like performing I/O), so converting locks into transactions is sometimes impossible. The previous transactional models required every execution of a critical section to be protected by either a lock or transaction. The authors created cxspinlock, a primitive that gives more flexibility as it allows locks and transactions to work together.

In my opinion the main contribution of this paper is the creation of cxspinlock. The authors present an optimistic approach for the management of transactions. Conventional spinlocks prevent multiple transactional threads from executing a critical region in the same time, even if it is safe to do so. On the contrary, cxspinlocks allow multiple transactional threads in critical section. In this way, fine grained parallelization can be achieved. When a problem occurs, e. g a code path in critical section requires mutual exclusion, the transaction restarts and acquires the lock exclusively. This approach reminds me the optimistic concurrency control in database systems, where many transactions are executed simultaneously and when a conflict happens, some of them are restarted. This mechanism helps also handling I/O. Instead of locking the whole critical section from the beginning, we use transactions and when there’s an I/O the procedure is restarted and a lock is acquired. Another contribution of the paper is the use of HTM, for scheduling. We can achieve greater throughput by using information from the HTM to make scheduling decisions. Very important is also the fact that in this system priority inversion can be eliminated by the contention manager.

One flaw of the system is that in some cases the combination of transactions and spinlocks may lead to deadlock. On the contrary, if we only used transactions, deadlock would be avoided.

As we can see in the evaluation section, performance problems can exist. More specifically, much time is spent for the restart of transactions and back-off. As the number of CPUs increases the performance of the cxspinlock in TxLinux becomes worse compared to that of Linux (32 processors). One advantage of the cxspinlock, compared to conventional spinlock, is that it allows more concurrency when the average transaction sizes are not small. However, in general, we can see that the performance of Linux is not dramatically worse than that of TxLinux.

As far as it concerns the usage of transactional spinlocks to other parts of an operating system, I believe that they can be used in any part that needs locking and doesn’t have high contention.

Posted by: Avrilia Floratou | October 9, 2008 01:34 AM

Summary:
This paper describes TxLinux, a variant of Linux, firstly using hardware transactional memory (HTM) as synchronization primitive, and firstly managing HTM in the scheduler. And two innovations of TxLinux including cooperation between locks and transactions, and the integration of transactions with the OS scheduler are also discussed in this paper. Cxspinlock (cooperative transactional spinlock) which enables a novel way of managing I/O within a transaction is also introduced in this paper.
The problem the paper was trying to deal with:
The main goal of this paper is to develop TxLinux and to make a description on this new operating system that uses HTM as a synchronization primitive, and presents innovative techniques for HTM-aware scheduling and cooperation between locks and transactions.
Contributions
1． The paper introduces a new operating system, TxLinux, which is a variant of Linux that is the first operating system to use hardware transactional memory as synchronization primitive, and the first to manage hardware transactional memory in the scheduler.
2． The paper proposes cooperative transactional spinlock (cxspinlock) which allows cooperation between locks and transactions.
3． The paper proposes hardware transactional memory mechanism which can nearly eliminate priority inversion and it also introduce OS scheduling techniques using information from HTM to increase system throughput.
4． The paper proposes a novel mechanism for handling I/O within transactions that allows a transaction that performs I/O to automatically restart execution and acquire a conventional lock.
Flaws:
(1) There are some problems with cxspinlocks which cannot avoid the possibility of deadlock. (1) Although transactional memory technique is claimed to reduce the parallel programming complexity, it is still difficult.
What it makes easier:
(1) Transactional memory is a programming model that makes concurrent programming easier. (1) Modern HTM designs do not place limits on the size of the data contained in a transaction, then make programming with transactions much easier.
What it makes harder:
(1) Transactional memory is supposed to make the programmer’s life easier, but by allowing transactions to cooperate with locks, it appears to be making the programmer’s life more difficult. (2) Proposed HTM designs have limitations that prohibit transactions in certain scenarios such as performing I/O, which make converting every instance of locking to use transactions be hard.

Posted by: Tao Wu | October 9, 2008 12:57 AM

Summary:
Paper describes the TxLinux Operating System that uses Hardware Transaction Memory (HTM) as synchronization primitive. Paper talkS about integration transaction with OS scheduler. Later part of paper compare measurement result of Linux and TxLinux for synchronization overhead and degree of concurrency.

Problem:
Arrival of multi-core system poses new programming challenge in parallel programming field while developing fine-grained parallel programs. Achieving scalable operating system performance with locks and current synchronization is very costly. Novel cxspinlock based HTM has been proposed that allows critical section to be protected either by lock or by transaction.

Contribution/Work Summary:
1. Transaction Memory is programming model that makes concurrent programming easy by letting user define the critical regions. System executes these critical regions atomically and in Isolation. In case of the violation of isolation critical section will be restarted.
2. TM is optimistic Concurrency model, which perform well when critical sections do not interfere with each other. As locking is used to avoid the worst case scenario of multiple thread accessing same data simultaneously, in same way optimistic concurrency will restart the transaction only in worst case. So overhead of locking in normal case is avoided in such model.
3. MetaTM uses eager conflict detection mechanism, which means first memory access that causes the conflict also cause one of the transactions to restart. To determine the conflict read-set and write-set of transactions are compared.
4. I/O operations are difficult to rollback so such operation breaks Isolation and Atomicity of transaction system. So to perform I/O operation exclusive lock is required.
5. Transactional and non-transactional thread can be synchronized using cx_exclusive and cx_optimistic lock type of cxspinlock. This allows multiple transactional thread to be in critical section simultaneously and provide exclusive access to non-transaction thread.
6. Contention manager of MetaTM almost eradicate priority inversion problem caused by locks, which cause higher priority thread waiting for lower priority thread. This is implemented by providing a interface through which OS communicate scheduling priority and policy to contention manager.

Flaws:
1. MetaTM cannot support a transaction whose updates are large than available memory.
2. Small transaction might cause big transaction to restart, which will cause lot of overhead,. From paper it is not clear that conflict resolution can be intelligent enough to restart small transaction instead of restarting big transaction.

Tradeoff:
1. Tradeoff is done between lock latency and restarting of transaction. This model avoids lock latency in less contented environment but needs to restart transaction with in case of high contention. So it’s a optimistic trade-off thinking transaction doesn’t overlap so frequently in the system.
2. By allowing locks with transaction this model reintroduce some of the problem like deadlock, which was the initial intention of model. So to support critical section with I/O model makes some compromise.

Where Technique can be applied:
1. Cxspinlock API allows current OS kernel to be modified quickly to use cs_optimistic call instead of spinlock.

Posted by: Nikhil Teletia | October 9, 2008 12:49 AM

The TxLinux paper treats operating systems as large parallel programs. Parallel programs are difficult to write, and more difficult to write correctly (the authors of Linux agree). Transactional memory is a new concept which addresses this difficulty, and the authors' goal with the work in this paper is to make use of them in an operating system.

Several known problems with lock-based synchronization are solved by transactions: Composition is less problematic, fine-grain parallelism is more easily achieved, and execution is more parallel by default. Transactions do not perform well in all situations, however (high contention means many aborted transactions), and in others they make the behavior of the system inconsistent (the "output commit problem"). Furthermore, transactions interact badly with non-transaction-aware locks. The operating system is critical and fragile enough where none of these caveats are acceptable, so it is in this that the bulk of the work is to be done. The authors of TxLinux propose a hybrid lock/transaction primitive (the cx_spinlock) which offers the option of either locking optimistically (with transactional memory) or locking pessimistically (with a spinlock). Each operation may be applied to the same lock, and so the benefits of transactional memory are offered without removing the programmer's ability to handle high-contention cases. They further this with a policy for transactions performing I/O, where such a transaction is restarted at most once before it owns an exclusive lock. Because this primitive behaves externally in a way similar to the lock primitives already used in systems, it can easily be applied.

The performance of the cx lock primitives should be at least as good as locking, but it is unclear that this should be optimal. In particular, different executing threads might both perform I/O without actually stepping upon one another's toes, so their default acquisition of an exclusive lock for I/O forbids the exploitation of concurrency. Cx_spinlocks do not have any properties which make this better, and in fact the programmer would have to revert to fine-grain locking to regain this parallelism, in which case the entire point is called to question.

In general, the authors make a very strong statement here in favor of transactional memory, which has yet to be released on commodity hardware. It's particularly critical that they identify an OS kernel as a place for transactions, because (as OS's are complex) it indicates a strong confidence in the technology. Cx_spinlocks are probably not the optimal abstraction on their own because of their ignorant behavior in situations like I/O. As a compromise between locks and transactions, however, they do make a solid middle ground which eliminates several downsides compared to an all-or-nothing solution.

Posted by: Tack | October 9, 2008 12:30 AM

Summary
The authors have built a Linux variant, TxLinux that uses hardware transactional memory(HTM) to provide concurrent access to shared resources. They provide a primitive, cxspinlock that allows locks and transactions to co-exist while limiting the concurrency only when there are conflicts.

Problem
The current systems are designed to use locks for any kind of shared access. Locks are an added overhead even if there is no contention and requires great care by the programmer to ensure small critical sections and avoid deadlocks. The authors believe that transactions provide better performance than locks by ensuring concurrent accesses in critical sections, without any priority inversion.

Contributions
- Transactions generally have the advantage over locks that they allow multiple processes to execute a critical section concurrently as long as they don't have a conflict. So instead of writing many small critical sections protected by locks, we can have one long transaction without much loss in performance and code simplification.
- Their cxspinlock primitive allows normal locks to co-exist with transactions. This avoids a complete overhaul of the system and can handle cases that are impossible without transaction(e.g. device i/o)
- They even modified the operating system scheduling policy to account for the transactional history of a process. This is very useful and to some extent required to prevent excessive restarts and wasted cycles due to a single process.
- Conflicts are resolved by a contention manager, which ensures that there is no priority inversion and make more intelligent decisions on which threads to restart(smaller in size/newest).

Flaws
- Since they allow co-existence of locks and transactions, transaction threads get penalized when they conflict with a non-transactional thread(not in control of the contention manager).
- Also because of flat nesting even if the nested transaction ends, the changes are not committed. Since the user is unaware of this, it can lead to more deadlocks.
- Any transaction based system, has to buffer all the operations of the transactions which is an added overhead. Also restarts on conflicts can be very frequent for shared data structures with high contention which also showed in the bonnie++ workload.
- Linux is an open source OS that has been optimized over last 30 years, because of which all their critical sections are already the shortest possible. As a result the impact of a transaction system here may not be worth the five man years spent on it.

Techniques
They use hardware transactional memory to allow atomic transactions on shared resources. Locks are still used for device i/o(like network access) which cant be buffered. They sacrifice overhead of maintaining the transaction buffers and lost restart cycles at the cost of allowing parallel execution of critical sections. Although transactions may have a bigger impact on OS that are still not as mature as Linux/Windows.

Posted by: Tushar Khot | October 9, 2008 12:23 AM

Summary
This paper describes TxLinux, the first operating system to use hardware transactional memory as synchronization primitive, and the first to manage HTM in the scheduler. It also talks about the cooperation of locks with transactions, and the integrating locks with the scheduler.

Problems
With all processor manufacturers planning to scale large number of cores on a chip, it’s a challenge to develop a technique to reduce parallel programming complexity while maintaining the performance of fine grained parallelism. Usage of locks and synchronization options, with systems having thousand of cores will require complex programming and performance cost. Hardware transaction memory can help achieve this parallelism at high performance and reduced code complexity. HTM is a hardware implementation of transaction memory appropriate for OS usage.

Contributions
• Converting the Operating System to use Hardware transactional memory as synchronization primitive. Using transaction primitives, several threads can execute in a critical section simultaneously, unless they access same memory address. A violation leads to transaction restart. Thus, leading to high concurrency and minimal lock need.
• HTM mechanism nearly eliminates the priority inversion and policy inversion by using os_prio policy. This policy prioritized threads on the basis of scheduling priority, size of read write set and timestamp.
• Mechanism for handling I/O within transactions to automatically restart those transactions and acquire a conventional lock.
• Decoupling of I/O operations from system calls provides full transactional programming model even to user-level critical regions that may modify device state.
• A co-operative transactional spinlock (cxspinlock), which can be called from transactional or non-transactional thread and it, favors parallelism started by transactions.
• TxLinux scheduler maintains a per-thread transactional profile, which tracks restarts and backoff cycles. This enables to dynamically prioritize the threads and schedule them according to the number of restarts/failures.

Flaws
The behavior of transactional and non-transactional locks has not been evaluated properly through experimentation. Introduction of cxspinlock doesn’t guarantee deadlock free system.
To provide full transactional modeling at user level, the I/O is decoupled from the system calls. This decoupling requires buffering the I/O operation into memory till the commit time, which may require huge memory requirements. If there are not enough memory resources, the process is killed. This behavior is not ideal.

The techniques used to achieve performance and tradeoff.
The hardware transactional memory uses the technique of removing all locks and using transactions as a synchronizing tool. Threads can simultaneously access same data block, but an access to same memory leads to violations and thus, causing failure of transactions, except one. But transactions cannot simply eliminate or replace locks completely for several reasons, including I/O operations. Thus cxspinlocks primitive is used that allows locks and transactions to work together to protect same data. Introducing locks, tradeoffs the performance of transactional memory but provides reliable system.
To achieve this kind of performance, they had to do efforts in changing Linux to TxLinux. The process of identifying locks, converting and testing them was time-consuming and difficult. There is an addition of the complexity to the underlying system for higher performance and reduced complexity at user level.

Posted by: Rachita Dhawan | October 9, 2008 12:15 AM

Introduction:
This paper introduces TxLinux, a variant of Linux which uses Hardware Transactional Memory as a synchronization primitive, instead of traditional locks. The paper points out the problems with pure transactional behavior, and proposes a system where locks and transactions can co-exist. The paper also introduces a transaction-aware scheduler which avoids priority inversions.

What were they trying to solve:
Concurrent access control in OS is traditionally pessimistic: Any resource which might potentially be used concurrently needs to explicitly locked to maintain consistency. This leads to decrease in concurrency, especially in the case of core kernel objects which are frequently accessed concurrently. Transactional memory systems offer an optimistic alternative to this: a resource can be concurrently accessed as long as the different threads don't interfere with each other. But locks cannot be completely eliminated in a real OS, since certain operations like I/O are irreversible. The author's intention is to develop a lock primitive which can co-operate with transactional memory.

Contributions:
The authors introduce the new synchronization primitive cxspinlock, which has the advantage of higher concurrency while retaining the capacity for exclusive access if required. Processes initially obtain an optimistic lock, which is non-exclusive and starts a transaction. When a conflict is detected, the transaction is restarted, and the lock is upgraded to an exclusive one.
Irreversible operations like I/O are handled in the same way as conflicts: the transaction is restarted in exclusive mode. This ensures that I/O operations always happen in a mutually exclusive and therefore consistent setting.
A common problem in traditional systems is Priority inversion: A low-priority thread obtains a lock and makes the high priority process wait. With the absence of locks, such concurrent access would result in a conflict, which is handled by the conflict manager, which can take priority onto account to restart the correct process. The authors generalize this to handle policy inversions also.
A scheduler which takes into consideration the transactional state of the thread while making scheduling decisions. This has a significant impact on the concurrency of the system, since it can reduce the scope for conflicts.

Flaws:
The idea I get is that MetaTM is implemented as a module in the Simics system simulator. I'm not sure if real hardware transactional memory won't result in a significant performance penalty. Also, the size of the write logs might have in impact in the cost of maintaining and restarting transactions.
The problem with optimistic concurrency systems is that the cost of restarts is always large. In pathological cases, a large transaction may repeatedly have to be restarted many times.
The lack of true nesting in transactions can cause subtle errors and deadlocks. Even if a code blocks commits the transaction, it may still be holding on to the lock, if it is part of an outer transaction.

Techniques used to improve performance
The main aim of TxLinux isn't to drastically improve performance but to ensure concurrency.
Tradeoffs:
TxLinux tries to achieve a compromise between locking(which maintains strict consistency, at the expense of concurrency) and transactional model(which improves concurrency when contention is low, with a large performance penalty when contention is high).
another part of the OS where the technique could be applied:
Any resource which is traditionally protected by locks can be converted to transaction based locks, if concurrent non-interfering access make logical sense. The authors mention that files could be treated in the same manner, with possibly an intermediate secondary storage (NVRAM,FLASH) to achieve transaction behavior.

Posted by: priyananda | October 9, 2008 12:15 AM

Summary
In “TxLinux: Using and Managing Hardware Transactional Memory in an Operating System,” Rossbach et. al describe their modification of the Linux kernel in order to implement transactions as a means of synchronization instead of using spinlocks.

Problem
Using transactions instead of other locking techniques such as semaphores makes safe concurrent access to shared data easier to program (ever more important as multiprocessor machines become increasingly available) since a programmer does not need to worry about when data must be locked and in what order, instead simply specifying what section of code must be executed atomically. This simplicity helps reduce bugs and makes for more quickly understandable code. While transactions have been implemented in various user programs, they had not been made use of in OS kernels, so Rossbach et al. went about modifying the Linux kernel to make use of transactions.

Contributions
• Showing the feasibility of implementing transaction in the Linux kernel by comparing the modified transaction-based kernel against the standard kernel and showing that it both functioned and obtained respectable performance.
• Creating a system in which standard locks could function alongside transactions, in this case negating the need for completely rewriting all lock-based code.

Flaws
• Certain benchmarks severely degraded performance as compared with the standard locking-based sharing methods; in code such as the kernel of an operating system, such a degradation of performance may not be acceptable.
• Certain issues of the transaction based scheme were not addressed. For example, if the hardware cannot support the size of the transaction, the transaction is fully stopped and restarted in a software virtualized manner, resulting in duplicated work.

Performance Techniques
In general, this system was willing to sacrifice performance in order to obtain more readable (and thus presumably more likely to be correct) code. This being the case, efforts were still made to ensure good performance. As the various performance measurements show, the actual performance of the transactional Linux kernel was not always significantly worse than that of the standard kernel. Among the reasons for this is the fact that the transactions worked under the assumption that they would be successful, only rolling back and restarting when there was actual contention. Under sections of code that didn’t face high contention, this scheme worked well. However, when load become heavier (when 32 instead of 16 processors were running), it became more likely that a section of code would face contention, that a roll-back would be needed, and that performance would suffer.

Posted by: Mark Sieklucki | October 8, 2008 11:24 PM

Summary: The authors of this paper present hardware transaction memory(HTM) in a mechanism that can be used to increase parallelism on multicore machines running a single operating system. They introduce MetaTM as their protocol for accessing transactional memory, and Cxspinlocks as the way to be able to use transactions and normal locks at the same time.

Problem to Solve: Researchers in this project are using HTM to make parallelization of processes easier for the user to utilize as multicore systems become more of the norm, but locks on memory typically prevent this fine grained parallelization from occurring.

Contributions: First, the authors introduce Cxspinlocks as a way to use both transactions and locks within the same body of code. This makes access to critical section more easily parallelizable when possible and allows for traditional locks which can’t be replaced to remain. Secondly, they introduce MetaTM as the protocol they use to access transactional memory. Even though MetaTM is a variation on some other protocols, it improves upon some of these others through the implementation of cxspinlocks. Also through the use of MetaTM schuduling policy, they are able to reduce priority and policy inversion, which helps ensure that real-time processes get served first.

Flaws: One flaw of the paper is that the cxspinlock proposal actually introduces deadlocks in some cases where locks would not occur. They state that static checking tools could help with this problem, but they don’t describe how this would help or what program code would need to be statically checked.

What tradeoff is made: For a typical configuration of 16 CPUs this approach wastes significantly less time synchronizing, but for 32 or larger CPU configuration more time is wasted. Also, the cxspinlock protocol allows locks and transactions to coexist, but if both are activated at once, priority always goes to the traditional lock, thus starving the transaction.

Where else could this be applied: Another place a technique such as cxspinlocks could be applied is in large databases where typically no changes are committed. In this case instead of locking a record when a user queried for it and open it, it could be assumed to typically be just the user wanted to look at it. This many users could look at it in parallel and if a user modified something it would later be pushed to the databases and conflicting entries would be caught.

Posted by: Holly Esquivel | October 8, 2008 11:08 PM

Summary
TxLinux incorporates into Linux the idea of hardware transactional memory (HTM) to simplify parallel programming and lock contention on multiprocessor systems. TxLinux combines locks and transactional spin locks for cooperative spin lock which allows parallel execution of locked sections and restarts when I/O or address writing isolation is encountered. In addition scheduling is modified to track restarts and backoff cycles to reduce transactional conflicts.

Problem trying to Solve
Programming in a multiprogramming environment is complex with synchronization challenges and difficulty in writing for parallel execution. In addition, locks are valuable for isolating code sections, but leads to increased serialization of code execution, while transactions assist in parallel execution of protected sections, but lack the ability to protect against I/O calls.

Contributions
-A dynamic lock mechanism cxspinlock, which allows for both lock and transactional synchronization and the ability to utilize exclusive locking if I/O or write operations are encountered.
-Implementation of a hardware transactional memory system for examining the performance and implementation hurdles.

Flaws
The introduction of the paper states that it is attempting to create a technique for reducing parallel complexity and maintaining performance for scalability to possibly thousands of nodes; however, from the evaluation of TxLinux, the performance does not seem to scale much better even with 32 processors in comparison to linux with 32 processors.

Techniques
TxLinux takes the idea of transactional memory to another step with a dynamic approach. To achieve performance, TxLinux utilizes a hardware implementation and new instructions for transactional memory efficiency. In addition complexity is added to the contention manager for priority scheduling and backoffs to reduce restarts. The addition of these performance enhancements adds complexity to the underlying system for additional overhead. In addition the implementation requires all of the transaction memory state fit in physical memory imposing a limit in the ability to handle high contention and complex systems. Added complexity and overhead is traded for increased parallelism and simplified programming. This technique targets use in all areas of the OS. The transactional technique with state information could be usefully integrated into debugging utilities for state analysis. In addition it could possibly be used to spin off threads since the state is stored in transactional memory, multiple parallel executions could be spawned from the beginning of a transactional section.

Posted by: Cory Casper | October 8, 2008 10:53 PM

PROBLEM DESCRIPTION

The authors propose TxLinux, a modified Linux kernel which exposes hardware transactional memory as a synchronization primitive and integrate transactional code execution into the system scheduler.

Transactional memory is a new technique that aims to simplify the programmability, maintenance and reasoning of parallel programming. Hardware transactional memory systems implement basic hardware support for native execution of transactional code blocks and can achieve high performance. However, existing no operating system has been designed to take advantage of transactional memory hardware. Converting the code from a lock-based style to a transactional style is a challenging task and the authors show some of the trade-offs of this approach.

CONTRIBUTIONS

The authors propose cxspinlock which allows locks and transactions to cooperate. A nice idea is the interplay between the kernel and the hardware transactional memory system: The kernel can learn the state of the transactions in each process. In this way, they can avoid context switching a process that is executing a high-contention critical section, thus avoiding convoys. Finally, the authors show an experimental performance evaluation of their kernel modifications under different workloads.

FLAWS

I believe that the authors present an oversimplified solution about I/O inside transaction segments. They argue that making I/O at commit is a simple and effective idea, but they fail to address important semantic ambiguities about the notion of I/O inside a transaction. For example, suppose that the transactional code calls select() to do I/O multiplexing. There are two problems with this approach:

(1) If I speculatively execute the select() call, then I have to revert back to the original state if the transaction fails, and I will also have to notify the subsequent callers (who possibly execute code outside transactional code segments), not to rely upon the results they had seen in the past. This is impossible to enforce, unless every select() call is within a transaction or I protect the select() code with a mutual exclusion mechanism, therefore defeating the purpose of allowing I/O in transactional code in the first place.

(2) If I postpone execution of the select call until the commit time, it is unclear how to deal with code inside the transaction which relies upon the return of the select() call.

Those two issues are fundamentally different than transactions in the database literature, where everything is a transaction (and thus everything can be aborted) and real-world actions are limited and isolated by nature (eg. whether the ATM actually gives money or not doesn't change the state of the database -- a human has to intervene to compensate for errors of real-world actions).

Another important issue is that their implementation of cxspinlock requires a single global object to work correctly. As the number of CPU cores increases, coherence traffic for the global lock object will severely limit the performance of their approach.

Finally, recent research [1] has revealed some pathologies when legacy locking code and transactional memory code overlap. Although TxLinux gives a single primitive, cxspinlock, to programmers, it is not clear to me how TxLinux deals with these pathologies.

EASIER/HARDER

Transactional memory can make complicated systems easier to understand, program and maintain. However, locks have been studied extensively and various strategies have been proposed to address the problems of locking, like (a) MCS locks for better scalability, (b) deadlock avoidance, detection and resolution algorithms for deadlocks and (c) static analysis tools for bugs when isolation or atomicity is breached. However, transactional memory is still in the research stage and while some locking problems are solved, a new set of performance pathologies --to which no solutions yet exist-- have been introduced. This has serious implications in the adoption of the transactional memory paradigm for production systems.

REFERENCES

[1] Haris Volos, Neelam Goyal, and Michael M. Swift, "Pathological Interaction of Locks with Transactional Memory", 3rd ACM SIGPLAN Workshop on Transactional Computing, February 2008.

Posted by: Spyros Blanas | October 8, 2008 07:24 PM

Paper Discussion - Fall 2008 - CS 736

TxLinux: Using and Managing Transactional Memory in an Operating System

Comments

Post a comment