Resource containers

 

  1. Web server design
    1. Basic problems

                                              i.     Lots of requests

                                             ii.     Requests may require disk i/o (for static data) or network i/o (for proxy cache)

                                           iii.     Data may not all arrive at once or be sent at once (if large)

                                           iv.     how do you take advantage of sleep time while waiting for i/o?

                                            v.     How do you store context of a request while you are waiting?

1.   On a stack?

2.   In an object?

                                           vi.     How do you isolate independent requests?

1.   for reliability?

2.   For performance?

    1. Designs

                                              i.     Fork-a-process (standard unix)

1.   listen in master process, fork off & call accept in worker

2.   client lives until connection goes away

3.   problem: for short connections (http 1.0 closes after every request), lots of overhead to fork processes

4.   Hard to fine tune scheduler with short lived process – doesnŐt reach steady state

                                             ii.     Pool of processes – becomes process per request not connection

1.   Master process listens, hands off socket to an existing worker process

2.   worker process does work

3.   worker adds self to queue waiting for next request

4.   problem: expensive context switching

                                           iii.     Single process / thread per connection

1.   Thread creating for each request, handles request, dies

2.   Problem: lots of context switching (but not as bad)

3.   Problem: lots of thread creation (expensive if need to do i/o in kernel)

4.   Problem: loss of protection for dynamic data

5.   Problem: thread scheduling by kernel, hard to prioritize

                                           iv.     Single process / thread per request pool

1.   Create pool of threads

2.   Thread handles a request, goes back in pool

3.   Most of same problems, but less overhead to create a request

                                            v.     Single process / single thread / Event driven

1.   All asynchronous / non-blocking I/O

2.   No scheduling in kernel

3.   app maintains queues of requests to handle at different stages

a.    receiving data

b.    sending data

c.    reading disk

4.   At each stage, app can choose what to do, when

5.   Can do lots of work on a stage to improve cache locality

6.   Problem: hard to write, canŐt use OS scheduling to account for work done in kernel on behalf of a request

  1. Accounting problems
    1. Process = unit of accounting for standard Unix, Linux, Windows
    2. Kernel work on behalf of a process not accounted to process

                                              i.     Accounting done to currently executing thread (or nobody)

1.   paging – replacing a page for someone else to use

2.   network packet receipt

3.   block device transfer

    1. This doesnŐt fit web servers

                                              i.     Heterogeneous workload / different priorities per work item not visible at process level

                                             ii.     Lots of work in web server happens in kernel (70% for apache with static web pages)

                                           iii.     Work may encompass multiple threads ina process / multiple processes, or they may all be doing different things

  1. Goal:
    1. Flexible definition of an accounting unit

                                              i.     Can be < 1 thread, > 1 thread, > 1 process

                                             ii.     Include kernel work on behalf of an accounting unit

                                           iii.     use for all kernel scheduling – any time something is queued, it or given resources, it should be subject to this

  1. Resource containers
    1. Like Opal, break apart an abstraction into finer grained, more flexible components

                                              i.     Standard technique in systems – like exokernel

                                             ii.     QUESTION: what are problems?

1.   Making sure new abstraction is right

2.   New one is usually finer-grained, need to minimize overhead

3.   Protection: process is well understood, need new model

    1. Concept:

                                              i.     resource container == unit to which accounting happens, guides scheduling decisions instead of a process

                                             ii.     Contains all system resources used by an activity

1.   cpu time

2.   kernel objects , buffers

                                           iii.     Have attributes

1.   scheduling parameters

2.   resource limits

3.   network qos

                                           iv.     Separate from security principals (users)

                                            v.     QUESTION: how do you draw this architecturally? Is it a layer? What is the overhead cost at runtime? When do you pay this cost?

1.   A: already collecting stats for accounting to a process

2.   A: new overhead when consulting container for scheduling

3.   A: new overhead when changing containers

4.   A: new overhead when accounting things not previously accounted (e.g. network packet receive)

    1. Relation to threads / processes

                                              i.     Threads start in a default container, can dynamically bind to others

                                             ii.     Multiple threads / processes can be assigned to a container

                                           iii.     usage: thread working on multiple things changes container as it changes activities

1.   e.g. thread pool per request, event driven model

                                           iv.     Can pass between processes (like an FD) for cooperating processes

    1. Scheduling

                                              i.     RC allows scheduling per activity not just per process / thread

                                             ii.     Base case: thread is scheduled according container it is bound to

                                           iii.     Threads bound to multiple containers use scheduler binding

1.   == set of containers it has been using

2.   combined to make decision (e.g. average, sum if not shared)

3.   NOT DETAILED

    1. Introspection

                                              i.     Application can inspect accounting information for its own use

    1. Nesting

                                              i.     Like Opal resource groups, Nucleus

                                             ii.     If give a proportion to a container, == sum of child containers

                                           iii.     E.g., give 20% of CPU to cgi/bin – can then have a container per request in cgi/bin container

    1. Network usage

                                              i.     Can bind a socket to an RC

                                             ii.     All packets on the socket are accounted to an RC

                                           iii.     Socket can be a packet filter ( group of addresses )

1.   Allows scheduling before a connection is established

2.   Allows giving higher priority to some group before they connect

3.   Allows classes of packets to be given no resources – black hole

    1. Protection:

                                              i.     Who can set container parameters?

1.   Parent can set for child

2.   Child inherits portion of parent

3.   Child can lower self?

                                             ii.     What if a cgi-bin script is malicious?

    1. QUESTION: is this general purpose?

                                              i.     Could you use it in a file server to allocate cache space, disk bandwidth, network bandwidth?

                                             ii.     Does it apply in general to server applications with multiple clients?

  1. Uses
    1. Assign percent to cgi scripts
    2. limit runtime of cgi scripts
    3. DOS attack

                                              i.     Goal: find a signature for attack packets, drop them early

                                             ii.     Goal: prevent a single code path from being over used – e.g. slow cgi/bin path

                                           iii.     limit code paths – e.g. all syn packets get % of time, then dropped

                                           iv.     Ip-based filters – to black hole

                                            v.     RC limit: filters are only on address, not packet contents.

  1. Evaluation
    1. Cost: how expensive are primitives?
    2. QUESTION: what is missing?

                                              i.     e.g. cost of having kernel use RC, frequency of RC invocation in kernel

    1. QUESTION: how evaluate?

                                              i.     With single-process event driven server

                                             ii.     Show that can constrain resources based on various classifications (e.g. max fraction, proportional share)

  1. Relevance today
    1. MS has jobs – groups of processes that share scheduling / accounting / security privilege information