Paper Discussion - Fall 2008 - CS 736: Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism

« Resource Containers: A New Facility for Resource Management in Server Systems | Main | Memory Resource Management in VMware ESX Server »

Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism

Thomas Anderson, Brian Bershad, Edward Lazowska, and Henry Levy. Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism. ACM Trans. on Computer Systems 10(1), Feburary 1992, pp. 53-79.

Review due for this or Resource Containers on Tuesday, 10/14.

Posted by Michael Swift on October 14, 2008 06:05 AM | Permalink

Comments

Summary:
The paper describes the design, implementation and performance of a kernel interface and a user-level thread package in which the performance of user-level threads and the functionality of kernel threads are combined together. The main kernel mechanism used in the paper is called scheduler activation, which provides a virtualized processor and well described notification to the user level to make parallel programming easier, more flexible at the user level.

The problem the paper was trying to deal with:
The paper describes the disadvantages of approaches to support threads either by the operating system kernel or by user-level library code in the application address space. In order to avoid the disadvantages, the paper suggests providing each application with a virtual processor and using scheduler activations as a method of communication between kernel-level and user-level threads.
Contributions
1． The paper presents a very new idea called scheduler activations, by using them the address space thread scheduler is notified of the kernel events. Consequently, information can be efficiently communicated between user level thread and kernel level thread schedulers.
2． The virtual microprocessor idea presented in this paper is very good. It allows the kernel to easily control processor allocation and to schedule many processors as effectively as possible.
3． The paper proposes a separating policy and mechanism, in which processor allocation is done by kernel, but scheduling is done by user-level packages.
Flaws:
(1) The paper doesn’t clearly demonstrate why thread integration is very important and don’t demonstrate why kernel involvement is very critical to thread management. (2) The paper doesn’t demonstrate the implementation of “scheduler activations” clearly.
What it makes easier:
The paper is trying to improve the multithreadedness of programs and make it more effective to use threads. This thread technique would improve throughput, responsiveness and time to completion.
What it makes harder:
It is a good idea of preventing evicting a thread in a critical section, but a malicious user might utilize it if there is no protection on this feature. Some problem may happen if a user malicious user marks his/her code as part of a critical section and never release it.

Posted by: Tao Wu | October 14, 2008 11:49 AM

The Summary:
This paper initially talks about the drawbacks of using kernel and user-level threads, and lack of support in kernel for user-level thread library. Then paper talks about the design, implementation and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism.

Problems:
The thread package views each processor as a virtual processor and treats it as a physical processor under its control. In reality though, these virtual processors are being multiplexed across real, physical processor by the underlying kernel. The user-level thread performance is highly impacted by multiprogramming, I/O and page faults. This is a consequence of inadequate kernel support.
Programming with kernel threads avoids system integration problems exhibited by user-level threads, because kernel directly schedules each application’s threads onto physical processors. But, kernel threads are too heavy weight for use in many parallel programs. The programmers are, then, faced with the dilemma of choosing between two.
This paper combines the functionality of kernel threads with the performance and flexibility of user-level threads to provide a user-level thread package and a new kernel interface.

Contributions:
1. The kernel allocates the processors to address space; the kernel has complete control over how many processors to give each address space’s virtual multiprocessor.
2. Each address space’s user-level thread system has complete control over which threads to run on its allocated processors.
3. The kernel notifies the user-level thread system whenever kernel changes the number of processors assigned to it. It also notifies the thread system whenever a user-level thread blocks or wakes up in the kernel.
4. The user-level thread system notifies the application when it needs more or fewer processors.
5. The communication between kernel processor allocator and user-level thread system is structured in terms of scheduler activations.
6. The processor allocation policy for kernel designed “space share” processors while respecting priorities and guaranteeing no processor idles if there is work to do.

Flaws:
Applications inform the kernel whether they need processors or not. But this could lead to unfair distribution in case of wrong information. This issue is said to be resolved by using multilevel feedback. The processor allocator favors address spaces that use fewer processors and penalize those that use more. But this policy totally contradicts the flexibility of user-level thread library.
The paper does not mention much about the design and structure of scheduler activations, which could have added more understanding.
The paper could have considered pathological case of thread interrupts. Considering a high I/O application, there can be several scheduler activations and thus many interrupt overheads.

Tradeoffs
FastThreads on scheduler activation has a small increase in latency for null fork and signal-wait operations than user-level threads for enhanced processor utilization and fairness. The optimization imposes both time and space overheads, though small but provides with an effective parallelism.

Posted by: Rachita Dhawan | October 14, 2008 08:43 AM

Summary
In “Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism,” Anderson et al. discuss parallel computation by using processes, kernel-level threads and user-level threads, stating the problems of each and a solution which employs the best features of user-level threads while implementing functionality which mitigates the weaknesses.

Problem
For applications to take advantage of multi-processor systems effectively, they must be written with support for parallel processing. The facilities for doing this include using multiple processes, kernel-level threads, and user-level threads. Processes suffer from considerable overhead due to having completely independent supporting structures. Kernel-level threads mitigate most of this overhead, but still suffer from requiring scheduling to invoke the kernel. User-level threads mitigate the overhead of kernel-level threads, but suffer from scheduling problems in which the kernel scheduler can make suboptimal scheduling choices for the threads.

Contributions
• Creating a system in which the benefits of user-level threads are achieved while the negative effects related to scheduling are mitigating by implementing a novel approach that allows for communication between the kernel scheduler and the user-space scheduler.
• Ensuring that the top-level scheduling that is done by the kernel does not adversely affect the user-level scheduling (when a processor is pre-empted and taken away, the user-level scheduler is activated and it is up to that scheduler to decide how to deal with the reduction of resources).
• Providing a method by which notifications of state by the user-level scheduler has minimal overhead (when a processor is made idle, the user-level thread notifies the kernel-level scheduler but it does not provide subsequent notifications and the idle process is not unallocated unless it is needed elsewhere).

Flaws
• Lack of discussion about the user-level scheduling techniques. Presumably, there exists a default scheduler which can be overwritten to suit an application, but very little is said on this topic. Since user-level scheduling is an important aspect of this method, it would have been useful to read more about this.
• Performance evaluations are only done up to six processors. Since scalability is a very important aspect to assure that a method efficiently makes use of resources, it would have been useful to see the performance results for more processors.

Performance Techniques
The opportunity which is exploited here is that user-level scheduling can be done cheaply but can suffer from higher level policies implemented by the kernel. To provide for the speed benefits of user-level scheduling while minimizing the negative effects of the higher level policies, a careful set of methods for communicating state between the user and kernel level are employed. This approach attempts to provide the greatest benefit by minimizing the communication with the kernel to the minimum amount needed such that performance is maximized at the user level while, at the same time, information about any higher level scheduling activity that can cause negative effects is communicated to the user level. This method of communicating information between systems to improve performance can be used in a variety of places; specifically, as processor cores become increasingly heterogeneous in terms of speed and cache sizes, performance information of programs can be communicated—when programs are scheduled such that the cache fills up for a certain set of cores, the programs can be rebalanced in a way that minimizes caches misses.

Posted by: Mark Sieklucki | October 14, 2008 08:40 AM

Summary

The paper discusses the design and performance evaluations of a hybrid thread management system based on the concept of "Scheduler Activations" that achieves the performance cum flexibility of user level thread management systems and the functionality of Kernel level thread management systems by using communication of events and processor allocation requests between Kernel and User level thread management parts.

Description of the problem being solved

The paper attempts to design and build a thread management library that can provide the functionality/features of Kernel Thread Management Systems and the performance/flexibility of User level Thread Management Systems.

Contributions of the paper

1) The paper has come up with the right level of interaction needed between the Kernel and User-level parts of the Thread management systems which will
not compromise on achieving the functionality that Kernel Thread management systems provide and at the same time not cause performance overheads due to
excessive interaction across the Kernel boundary. The earlier related work Psyche and Symunix did not notify user level of some important events like
rescheduling of a preempted virtual processor.

2) The design has a very good policy separation. Kernel has no knowledge of any application's concurrency model or scheduling policy or of the
parallelism in the application. All these are left to the User-level thread management system. This gives the needed flexibility. Also, this gives
performance improvement because of not generalizing these policy across all applications.

3) The performance enhancement to remove the overhead of handling critical section by using copies of critical sections which get executed only during
the uncommon case of premption is a nice innovation borrowed from Trellis/Owl garbage collector.

4) Overall, the concept of Scheduler Activations as a combination of the right techniques needed to provide a solution to the problem stated above is a
significant contribution.

Flaws in the paper

1) More rigorous evaluatoin could have been performed by running varied application with parallelism ( rather than just on N-body problem ) and also
the results could have been compared with the results of related work like Psyche or Symunix.

Techniques used to achieve performance

1) Communication of events and other requests between Kernel and user-level parts of the Thread management system.

2) Leaving the choice of Thread scheduling policies to the user level and thereby avoiding generalization to achive performance.

Tradeoff made

1) Tradeoff has been made between communication across kernel boundary and the performance overhead. More communication can yield more flexibility and
features but it will decrease performance.

Another part of OS where this technique could be applied

1) The optimization techniques are more likely to find applications elsewhere. For e.g., the approach of reusing dead scheduler activations by caching
them can be used in any place where creation of dead entities are costly.

2) The approach of splitting a centralized management approach into parts that remain centralized and parts that can be customized to allow flexibility
and performance can be applied in many places. for e.g., in any resource allocation system or client-server system , the centralized manager in the
server side can provide virtual resources and the users on the client side can customize how they use the virtual resources.

Posted by: Leo Prasath Arulraj | October 14, 2008 07:26 AM