Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism
# Thomas Anderson, Brian Bershad, Edward Lazowska, and Henry Levy. Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism. ACM Trans. on Computer Systems 10(1), Feburary 1992, pp. 53-79.
Reviews due Tuesday, 3/13.
Comments
Summary:
The paper describes the design, implementation and performance of a kernel interface and a user-level thread package that together combine the performance of user-level threads (better performance in the common case) with the functionality of kernel threads (correct behavior in the infrequent case when the kernel must be involved). Each thread-level scheduler for an application has a virtualized view of the processors which map to physical processors and "scheduler activations" act as the glue between the kernel interface and the user-level thread package.
Problem:
Kernel threads are not a successful model for user-level threading. If kernel threads are used to express parallelism, they have poor performance and flexibility. If user-level threads are built on top of kernel threads - it leads to poor integration with system services.
Traditional user-level threads trade performance over flexibility and in some cases correctness. I/O operations, page faults lead to scheduling beyond the control of the user-level thread package.
Contributions:
Kernel threads presented as inherently a wrong abstraction for application threads.
Identification of the common case where the user-level thread package can manage the scheduling policy effectively as it has complete information - communicating this to kernel for kernel thread scheduling accommodation is expensive.
The idea is vaguely a precursor to guided virtualization where the processor availability is the variant and the thread package accounts for that.
Optimization for the critical section case using a copy of the code saves the common case of locking from being expensive.
Flaws:
The paper I felt was very strongly convincing - Although they should have considered the pathological case where the thread package gets interrupted while its looking for a thread to schedule. If many scheduler activations come one after the other, the processing overhead could be serious. This is an infrequent case but for an I/O intensive application, this may be more serious. In general, the interruption of the thread package needed more information.
Performance:
The paper's core direction is performance. The idea is to optimize for the common case where the user-level thread package has complete information on which threads to schedule (if not priority). The scheduler activations can be effectively used for better scheduling such as better exploiting the processor cache (LIFO ordering of thread scheduling). I/O blocking which is problematic for user-level threads is mitigated by the incoming activation enabling another user-level thread to be scheduled. The optimizations for critical case and reuse of scheduler activation records enhance performance.
Posted by: Archit Gupta | March 13, 2007 11:42 AM
Summary
In this paper, the authors initially discuss the problems involved with implementing multithreaded code using kernel level thread support and user-level thread support. Later on they propose their own multi-threaded model that makes use of both the user-level threads for efficieny and kernel-level for reliability.
Problem Description
The authors claim that kernel level thread constructs are inherently slow because they are too heavy weights. Secondly, even though user-level thread constructs are quite efficient but they exhibit poor performance or even incorrect behaviour in the presence of I/O or page faults. To solve this problem, the authros suggest providing each application with a virtual processor and use schedular activations as a means of communication between user-level and kernel-level threads.
Contributions.
1. The idea of schedular activations is quite interesting. The address space thread schedular is notified of the kernel events using schedualr activations. As a result, information can be efficiantly communicated between kernel and user level thread schedulars.
2. If a user level thread is preempted while executing in critical level, poor performance can be resulted. The authors solve this problem by checking if the thread being preempted was executing in critical section. In order to optimize this process, the aurthors suggest making a copy of critical section by delimting the critical section with special assembly constructs. This idea seems quite interesting and can improve the efficiency of their model.
Flaws.
1. The authors discuss their is some overhead involved during the initialization phase of their threading model but they do not provide sufficient details about it.
2. The authors measured the performance of ULTRIX on a uniprocessor system but they do not mention why.
Performance.
The authors are mainly interested in improving the performance of multitreaded programs by running the thread scheduling at user-level. Secondly, they are interested in making the system reliable by using the kernel level information while making the thread scheduling dicisions
Posted by: Atif Hashmi | March 13, 2007 10:55 AM
SUMMARY
"Scheduler activations: effective kernel support for the user-level management of parallelism" by Anderson et al. argues against kernel threads and for user-space threads with an interface to the kernel.
PROBLEM
Thread implementation in user space is fast, but may not have enough information about the state of the hardware to "integrate" well with the system (e.g. immediately optimal number of threads will depend on immediately available resource, and user-space applications are not privy to).
Kernel threads on the other hand are not privy to information about an application's paralellism model, and may have higher overhead since a protection boundary needs to be crossed to manage kernel threads.
CONTRIBUTIONS
* Identification of pros and cons of different threading models.
* Design and evaluation of a system that combines best properties of user and
kernel space threads.
* Design and implementation of "scheduler activation" interface for managing threads without compromising kernel control.
* Hiding threading complexity from programmer
* Virtual processor abstraction
* Creating an altered critical section copy is, if nothing else, hardcore.
FLAWS
* Worst paper yet.
* Failure to clearly demonstrate why "thread integration" is something I should care about. They present some concepts but no concrete examples or
studies. One should not have to google to become convinced that an article isn't an utter waste of trees.
* Value of optimizations that authors proposed is unclear since they do not show how common the uncommon case is and admit that they do not fundamentally improve the common case. The source of performance gain they did get is not clearly presented.
* Failure to demonstrate that kernel involvement is necessary at all with thread management (I had to google to learn why that is so!). I would argue that in the absence of concrete specifications, effectively controlling thread performance from user space is just a small matter of programming.
* The claim that performance of kernel threads is *inherently* worse than user-level threads is not well-justified. Assumptions of that claim were not clearly stated and made the claim appear merely a sensationalist outburst.
* The authors seem to be overly fond of using important-sounding expressions like "scheduler activation", "vectored event", "vessel or execution context"
for concepts that in my opinion neither deserve proper names nor aid comprehension.
* How could a 27 page paper not contain a half-decent background
section?
PERFORMANCE
Scalability, memory consumption, time-to-completion, scheduling overhead, ease-of-use.
Posted by: Vladimir Brik | March 13, 2007 09:12 AM
Summary:
This paper proposes a high performance and flexible user level thread management mechanism with modified kernel interface called “Scheduler Activation”. Scheduler Activation mechanism provides a virtualized processor and well described notification to the user level to make parallel programming at the user level easy, efficient, flexible, and high performance.
Problem:
Existing methods for managing a parallel programming required to choose between a flexibility of a user level threads and a functionality of a kernel threads. Existing user level thread implementation provided a flexible of management interface but had a problem dealing with an I/O events because a traditional kernel did not provide enough information to threads. And an existing kernel thread implementations had a problem with performance due to the overhead of thread running.
Contributions:
Scheduler Activation approach does both lowering the cost of managing the parallelism by user level thread implementation and notification mechanism for user level so the user level could receive enough information for efficient parallelism even when I/O level events are happening.
The idea of allocating multiple virtualized processors for each process and improving the kernel interface which supports user level seemed clean, new, and powerful approach.
Effective notification mechanism for both direction between kernel and user space (managed based on address space) enabled low cost, high performance concurrency management.
Flaws:
To show how safe is this approach, I thought it might be nice if they had an discussion about how could the notification mechanism effect the kernel level scheduling and protection, but it seems like it will be fine if the kernel had an scheduling algorithm that has fairness and penalizing mechanism.
Performance:
Scheduling Activation effectively utilizes the multi processor by using low cost, easy to manage user level threads. Like exokernel and other lower level abstraction approach, it provides freedom of design for programmers and compiler, also Scheduling Activation still has the convenience of user level thread and simple interfaces. Improvement of performance such as throughput clearly described with experiments.
Posted by: Hidetoshi Tokuda | March 13, 2007 08:48 AM
Summary
The main focus of this paper was the design and implementation of a new threading model based upon scheduler activations. Scheduler activations can be likened to having the kernel provide information to user-level threads to help influence scheduling and resource usage.
Problem
The problem presented in this paper is that there seems to be no middle ground between user-level and kernel threads. User-level threads are fast and flexible. Kernel threads are deeply integrated with the underlying system. However, user-level threading models exhibit problems when dealing with blocking I/O requests, page faults, and multiprogramming in general. Kernel threading has its own problems--mostly along the lines of being slow.
The system detailed in this paper attempts to create the middle ground. When a change occurs in the system (page fault, blocking, etc), the kernel informs the user-level code that a change has happened, and that the user-level code must react to this change. This approach combines the deep integration of kernel threads while preserving the speed and flexibility of user-level threads.
Contributions
The virtual microprocessor is an elegant idea. They allow the kernel to easily control processor allocation while allowing the user-level threads to pretend as though they have run of however many processors and can schedule as effectively as possible.
Scheduler activations and "upcalls" are the methods used to accomplish the above. Perhaps my analogy is too simplistic, but I've been likening upcalls to traditional Unix signals like SIGINT, SIGSTOP. Upcalls allow the user-level code to be notified of when to execute runnable user-level threads, whether or not a scheduler activation has blocked or unblocked, and whether or not a user-level thread should be sent to the ready list due to a processor being preempted.
Flaws
There is much reliance upon user-level threads to make the right decisions.
The author's kernel implementation appears to be implemented in Modula-2. The authors freely admit that existing optimized assembly code will see performance benefits. This detracts from the reliability of their measurements.
Performance/Evaluation
Well, the second flaw that I noted makes me hesistant to make any claims about the performance of the new implementation. However, the goal of the authors was to at least approach the speed and efficiency of user-level threads and the given results clearly indicate that they were able to exceed the existing user-level thread implementation by a small margin.
Posted by: Jon Beavers | March 13, 2007 08:34 AM
Summary:
The paper describes "Scheduler Activations" which is a kernel interface used to create high-performing user-space thread packages.
Problem:
Threads can be created in user or kernel space. While user-space threads can be built without any modification/support from kernel, they suffer from the fact that they cannot consider global processor requirements - for e.g. a process with 5 threads of execution may get about the same processor time as a process with just one thread. Kernel threads, on the other hand, can make decision on thread scheduling knowing the global requirements, but suffer from the fact they have to be heavyweight and have to cross protection boundaries often for every thread operation. "Scheduler Activations" was an attempt to get the best of both worlds.
Contributions:
* Processor allocation done by kernel (based on global demands) and scheduling done by user-level packages (different strategies) - another instance of separating policy and mechanism. Kernel just provides means of allocating and releasing processor time, user-space packages decides what to do with it.
* Provides a clean solution that allows efficiency (best processor utilization) and flexibility (application-specific thread scheduling strategies), while not compromising on performance
* Adaptability to changing requirements (new programs added to the system and removed from system) enabled by notifications between kernel and user-space on demand/supply of processor time.
Flaws:
- Special requirement on the compiler to detect/handle deadlock scenarios. Another solutions could have been for the user-space to maintain a data structure containing locks, their states and thread threads (identifiers) that uses (locked/waiting). The temperorary continuation could be done based on a quick iteration through the datastructure.
- It wasnt clear on how exactly "scheduler activations" are implemented, represented, stored and pooled. That probably was too much of a detail for this paper, but it would have been nice to know.
- It would have been a good idea to implement the system from scratch without bothering about backward-compatibility. Backward compatibility could have been achieved with another user-space package which simulates kernel thread behaviour.
Relevance:
At this age of multi-core systems, multi-thread programming and thread scheduling strategies are getting ever more important. The paper was an excellent introduction to issues in thread scheduling and implementation of thread libraies. The "scheduler activations" seem to be a good solution to what it was trying to solve.
Posted by: Base Paul | March 13, 2007 07:59 AM
Summary
This paper presents a proposed solution to the design decision of whether to support threads in the kernel or at the user-library level. Their solution is to present a new abstraction for the kernel interface for management of parallelism which will not interfere with the speed advantages of user level threads and provide the "same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism." Specifically they give each application a "virtual multiprocessor" on which they allow the application to decide what virtual processes each user-level thread is run on.
Problem
The fundamental problem that the authors are trying to solve in this paper is that their are different cases in which user-level and kernel level threads perform better, user level threads are in large part faster, but they have problems dealing with I/O blocked threads and the inability to reclaim the kernel thread corresponding to a blocked user-level thread. Kernel threads on the other hand are able to solve these problems but exhibit slower performance across the board.
Contributions
The primary contribution of this paper is the approach of giving each application a virtual multiprocessor machine to work on, and the interface for taking processors away from this virtual machine and the ability of an application to request more processors or tell the kernel that it has fewer threads than processors. What this means is that the problem of how to allocate resources can be handled separately from scheduling which is done at the user level and can be done on the virtual multiprocessor machine however the application desires.
Flaw
The proposed model seemse very sound and the paper is well evaluated. Their are possible nitpicks but i don't believe that they detract from the contribution of this paper.
Performance/Evalution
The evaluation performed in this paper is done in terms of speedup using multiple processors under other threading models and the model presented, as well as the performance under less and less available memory. In both these cases the proposed system outperforms the competition.
Relevance
With the increasing popularity of an increasing number of cores in a personal computer, effective management of parallelism will become a topic of increasing import.
Posted by: Aaron Bryden | March 13, 2007 07:04 AM
Summary
Safe, efficient, and flexible user-level multiprocessor threading is accomplished by passing messages between the kernel and the user-level system at critical events.
Problem
Kernel based thread management is not efficient and a safe user-level threading solutions has not yet been devised. Implement user-level threading that is both safe and flexible.
Contributions
A user-level thread scheduler allows for fast and safe thread execution. Each address space contains a user-level thread system and is allocated a number of processors by the kernel. The user-level tells the kernel level how many processors the address space requires and selects threads to execute. The kernel level tells the user-level system when the address space�s processor allocation has changed, when a thread blocks, when a thread wakes up.
A significant contribution of the article is a system that uses threads assigned to the address space to perform kernel notifications to the user-level system. A running thread is pre-empted to process the notification and then restart the pre-empted thread and possibly a thread that is no longer blocked. Scheduling policies can be implemented by which threads the user-level system decides to execute.
By using a shadow copy of the compiled code along with explicit notifications of critical sections allows the scheduler to avoid swapping out a thread currently in a critical section. By only introducing overhead in the uncommon case of executing a critical section very little overhead is introduced.
Possible Improvements
After discussing working sets for memory it is obvious that the user-level code must be very sensitive to memory requirements of threads. In a CPU bound application this is not much of an issue, but running multiple I/O bound threads might quickly send the system into thrashing. For example as one thread waits for a scheduled I/O to return, ten other threads might have evicted its pages from memory. This could repeat for each of the ten other threads also. It would have been interesting to see how the authors dealt with memory management in their implementation.
The idea of preventing evicting a thread in a critical section is a valid one, but unbounded this feature could be exploited by a malicious user. A user could mark their code as part of a critical section and never release this. So once the code begins running on a thread then the kernel has no way of evicting the malicious thread. In the worst case all processors could be commandeered during an idle period.
Performance
The focus of the paper was on scale-out with respect to increases in processors. Throughput and efficiency are also increased.
Posted by: Kevin Springborn | March 12, 2007 10:13 PM
Summary
This paper presents the problems associated with the use of traditional threads for implementing concurrency within applications and presents a new architecture that attempts to solve many of the mentioned problems.
Problems Addressed
Threads can be implemented primarily in two places, either in the kernel or in user space. Threads implemented in user space are fast and are very flexible but behave badly when blocking on I/O. Kernel threads on the other hand deal well with I/O but suffer from performance degradation due to the need to call into the kernel more often when a thread context is switched. The authors see the problem mainly as a communication breakdown between the kernel and user space and propose a solution where a user level thread package can communicate with the kernel and visa versa so that both the user threads and kernel threads can make good scheduling and allocation decisions.
Contributions
The kernel is in charge of managing the physical processors in the system and creating virtual processors that are assigned to address spaces for user level. The number of processors assigned to any address space can be changed by the kernel. The kernel alerts the user level thread package whenever it changes the processor allocation or completes an I/O operation. In this way the user level thread package can have complete control over what threads are running and what threads are waiting to be scheduled. This communication is done by the kernel using what the authors call "scheduler activations" which are basically kernel threads that become the execution contexts for threads in user space. User level threads can be removed from an activation at any time and the activation can be reused for other threads. Scheduler activations are run on the virtual processors assigned to an address space or they can sit idle in the kernel.
Flaw
It is assumed in the work that the kernel has a number of virtual multi-processors available to it to allocate to the separate user level thread packages however it is not clear how virtual processors are mapped to the physical processors and what type of scheduling policy is used to run the virtual processors on the physical hardware.
Performance
Maintaining fairness and providing for scalability were two main objectives of this work. This was accomplished by providing a mechanism for communication between the kernel and thread level schedulers. Both kernel and user level space would then have enough information to make fair scheduling decisions while at the same time allowing for more applications to run since resources are fully utilized as long as there is work to be done.
Posted by: Nuri Eady | March 12, 2007 09:31 PM
Paper Review: Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism [Anderson, et. al]
Summary:
This paper presents a new kernel interface, the "scheduler activation",
that is used to provide an application address space (basically a
process) with a "virtual multiprocessor". The user process can schedule
threads however it sees fit, to provide better performance and/or
application-specific scheduling especially for parallel computing on
multi-processor hardware.
Problem:
The problem, as the authors state it, is that existing thread models are
lacking in performance (kernel threads) or reliability/convenience (user
threads within in a traditional process). They proprose that kernel
threads are the "wrong abstraction" because they limit performance when
it could be about an order of magnitude better.
Contributions:
The proposed interface seperates processor allocation from scheduling,
enabling the use of user threads rather than kernel threads for better
performance in parallel computing applications.
The proposed solution is elegant, requiring only six call operations to
provide transparency between the kernel's processor allocation and the
user processes scheduling desires. As they claim, it seems to the sort
of rare change that provides improves both flexibility and performance.
It is flexible in that it allows the application to use any
scheduling/concurrency technique it chooses, rather than being tied to
just ones impelemented in the kernel.
Flaws:
While not a flaw per se, the proposal does require code changes to mark
the boundardies of critical sections. This is probably reasonable given
the performance advantage that might be realized by identifying these
sections so that the program counter value can be tested to determine
when threads should not be interrupted.
There is some overhead (storage and processing) for the kernel managed a
pool of free "scheduler activations" and also for the copying of
critical section code to a special area for the aforementioned program
counter tests.
I found it odd that they talk of communication between the kernel and an
"address space" (which is inanimate) rather than the kernel and an
application scheduler or application thread system.
Performance Impact and Relevance:
This thread technique could improve throughput, responsiveness and time
to completion. It also provides a general alternative to asynchronous
I/O, by using threads with existing calls rather than introducing new
APIs.
It interesting that it introduces notification of changing number of
processors, which today could be useful to build into commodity
operating systems for running them in virtual machine environments.
Posted by: Dave Plonka | March 12, 2007 06:49 PM
Summary:
The paper describes the idea of implementing user-level concurrency with something the authors call scheduler activations. These are notifications of important events that are sent to the application to allow it to make more informed scheduling decisions amongst user threads.
Problem:
The problem they were trying to solve is that both user and kernel threads are in some sense insufficient. Kernel threads incur high overheads from needing to trap to the kernel and be processed by a general purpose scheduler, but existing user-level threads libraries didn't have enough information to make intelligent scheduling decisions. (For instance, it might schedule a low priority user thread on a running kernel thread/virtual processor/LWP while another, higher priority, user thread is scheduled on a non-running kernel thread. This arises because the user-level library doesn't know which kernel threads are scheduled and which aren't.)
Contributions:
* The idea of presenting a lower-level abstraction than was previously available. It smells very much like what I would envision the CPU multiplexing would look like in the Exokernel, but was a couple years before the first Exokernel paper. (In fact, I was curious, so I looked for Engler's thesis to see if it cites this paper. It does.)
* A demonstration that this solution seems to be viable; even with the constraint of maintaining binary compatibility with programs that use true kernel threads, they came up with a seemingly pretty reasonable implementation.
* A description of the problem that people trying to design OSs for highly-parallel systems have to contend with that accompany user and kernel thread choices.
Flaws:
The main thing that I didn't like was that they implemented this on a system which they said was not very up-to-date. I know that they were implementing it for other systems at the time of writing, but it can't help but make you wonder what would change. (For instance, they simply blocked for some time instead of going to disk.)
These are just nits, but are annoying enough I'll mention them. First, the first graph (fig 2) is in terms of speedup (higher is better) while the second is in terms of execution time (lower is better) for no discernible reason I could find. This threw me off for a couple seconds. Second, they had a whole paragraph in section 4.3 about how they implemented an optimization "that imposes no overhead", and then go on to describe how the optimization imposes both a space and time overhead (albeit not on the path they are referring to, and rather less than the alternative).
Performance:
The authors were sort of trying to improve the multithreadedness of programs, but not really; what they were trying to do was make it more effective to use threads. This would improve time-to-completion, throughput, and utilization, and could lead to higher degrees of parallelism if the overheads were the limiting factor before. They are also worried about fairness: one of their criteria is that the highest-priority threads within a process should be scheduled before lower-priority threads. They did this by creating a new interface to the kernel that exposed the information needed to do this scheduling at user time, thus getting most of the benefits of kernel threads (async. I/O, knowing which processors are scheduled) without most of the overhead.
Posted by: Evan Driscoll | March 12, 2007 05:08 PM