CS 736 – Fall 2006

Resource containers

Web server design

Basic problems

i. Lots of requests

ii. Requests may require disk i/o (for static data) or network i/o (for proxy cache)

iii. Data may not all arrive at once or be sent at once (if large)

iv. how do you take advantage of sleep time while waiting for i/o?

v. How do you store context of a request while you are waiting?

1. On a stack?

2. In an object?

vi. How do you isolate independent requests?

1. for reliability?

2. For performance?

Designs

i. Fork-a-process (standard unix)

1. listen in master process, fork off & call accept in worker

2. client lives until connection goes away

3. problem: for short connections (http 1.0 closes after every request), lots of overhead to fork processes

4. Hard to fine tune scheduler with short lived process – doesn’t reach steady state

ii. Pool of processes – becomes process per request not connection

1. Master process listens, hands off socket to an existing worker process

2. worker process does work

3. worker adds self to queue waiting for next request

4. problem: expensive context switching

iii. Single process / thread per connection

1. Thread creating for each request, handles request, dies

2. Problem: lots of context switching (but not as bad)

3. Problem: lots of thread creation (expensive if need to do i/o in kernel)

4. Problem: loss of protection for dynamic data

5. Problem: thread scheduling by kernel, hard to prioritize

iv. Single process / thread per request pool

1. Create pool of threads

2. Thread handles a request, goes back in pool

3. Most of same problems, but less overhead to create a request

v. Single process / single thread / Event driven

1. All asynchronous / non-blocking I/O

2. No scheduling in kernel

3. app maintains queues of requests to handle at different stages

a. receiving data

b. sending data

c. reading disk

4. At each stage, app can choose what to do, when

5. Can do lots of work on a stage to improve cache locality

6. Problem: hard to write, can’t use OS scheduling to account for work done in kernel on behalf of a request

Accounting problems

Process = unit of accounting for standard Unix, Linux, Windows
Kernel work on behalf of a process not accounted to process

i. Accounting done to currently executing thread (or nobody)

1. paging – replacing a page for someone else to use

2. network packet receipt

3. block device transfer

This doesn’t fit web servers

i. Heterogeneous workload / different priorities per work item not visible at process level

ii. Lots of work in web server happens in kernel (70% for apache with static web pages)

iii. Work may encompass multiple threads ina process / multiple processes, or they may all be doing different things

Goal:

Flexible definition of an accounting unit

i. Can be < 1 thread, > 1 thread, > 1 process

ii. Include kernel work on behalf of an accounting unit

iii. use for all kernel scheduling – any time something is queued, it or given resources, it should be subject to this

Resource containers

Like Opal, break apart an abstraction into finer grained, more flexible components

i. Standard technique in systems – like exokernel

ii. QUESTION: what are problems?

1. Making sure new abstraction is right

2. New one is usually finer-grained, need to minimize overhead

3. Protection: process is well understood, need new model

Concept:

i. resource container == unit to which accounting happens, guides scheduling decisions instead of a process

ii. Contains all system resources used by an activity

1. cpu time

2. kernel objects , buffers

iii. Have attributes

1. scheduling parameters

2. resource limits

3. network qos

iv. Separate from security principals (users)

v. QUESTION: how do you draw this architecturally? Is it a layer? What is the overhead cost at runtime? When do you pay this cost?

1. A: already collecting stats for accounting to a process

2. A: new overhead when consulting container for scheduling

3. A: new overhead when changing containers

4. A: new overhead when accounting things not previously accounted (e.g. network packet receive)

Relation to threads / processes

i. Threads start in a default container, can dynamically bind to others

ii. Multiple threads / processes can be assigned to a container

iii. usage: thread working on multiple things changes container as it changes activities

1. e.g. thread pool per request, event driven model

iv. Can pass between processes (like an FD) for cooperating processes

Scheduling

i. RC allows scheduling per activity not just per process / thread

ii. Base case: thread is scheduled according container it is bound to

iii. Threads bound to multiple containers use scheduler binding

1. == set of containers it has been using

2. combined to make decision (e.g. average, sum if not shared)

3. NOT DETAILED

Introspection

i. Application can inspect accounting information for its own use

Nesting

i. Like Opal resource groups, Nucleus

ii. If give a proportion to a container, == sum of child containers

iii. E.g., give 20% of CPU to cgi/bin – can then have a container per request in cgi/bin container

Network usage

i. Can bind a socket to an RC

ii. All packets on the socket are accounted to an RC

iii. Socket can be a packet filter ( group of addresses )

1. Allows scheduling before a connection is established

2. Allows giving higher priority to some group before they connect

3. Allows classes of packets to be given no resources – black hole

Protection:

i. Who can set container parameters?

1. Parent can set for child

2. Child inherits portion of parent

3. Child can lower self?

ii. What if a cgi-bin script is malicious?

QUESTION: is this general purpose?

i. Could you use it in a file server to allocate cache space, disk bandwidth, network bandwidth?

ii. Does it apply in general to server applications with multiple clients?

Uses

Assign percent to cgi scripts
limit runtime of cgi scripts
DOS attack

i. Goal: find a signature for attack packets, drop them early

ii. Goal: prevent a single code path from being over used – e.g. slow cgi/bin path

iii. limit code paths – e.g. all syn packets get % of time, then dropped

iv. Ip-based filters – to black hole

v. RC limit: filters are only on address, not packet contents.

Evaluation

Cost: how expensive are primitives?
QUESTION: what is missing?

i. e.g. cost of having kernel use RC, frequency of RC invocation in kernel

QUESTION: how evaluate?

i. With single-process event driven server

ii. Show that can constrain resources based on various classifications (e.g. max fraction, proportional share)

Relevance today

MS has jobs – groups of processes that share scheduling / accounting / security privilege information