Topic: Complexity

 

 

  1. Office hours: FIND SOME POSSIBILITIES
    1. Wednesday at 1:30
    2. Friday at 11 am, 1:30 pm, 2:30 pm
    3. Thursday 10:30 or 11 am
  2. OS seminar – Mondays at
  3. Reading groups – has everybody formed them?  People who are not in groups should send email to me so I can get you together.
  4. How did reading go? Posting on blog? Any problems?
  5. Context for these papers
    1. Context: computers up to this point programmed at low level:

                                               i.     Assembly language (PL/1 coming along)

                                             ii.     Not much agreement about abstractions

                                            iii.     Not much rigor/correctness given to design

    1. Context: complexity starting to get serious:

                                               i.     OS/360 was being designed – huge effort, thousands of programmers, late, buggy, not rigorously designed (e.g. corner cases involving interrupts very sloppy)

    1. OS had not really settled down

                                               i.     What services should it offer?

                                             ii.     What applications is it for?

                                            iii.     How should it be constructed internally?

1.     As a bunch of libraries?

2.     As a bunch of a layers

3.     As a hierarchy?

4.     As modules/subsystems?

    1. Problems being solved:

                                               i.     What is right way to organize OS to provide

1.     Protection

2.     Flexibility

3.     Simplicity / correctness

4.     Handle I/O efficiently (abstracted from processes)

                                             ii.     How do you battle the complexity of:

1.     Multiprogramming – e.g. different users, different tasks, different programs, different priorities

2.     Interrupts; re-entrant code

3.     Control: who controls things and how

4.     Flexibility: not much is known about how to do things, want to have flexibility to change things in the future

                                           iii.     What are the right abstractions to provide?

1.   e.g. processes, threads, messages, files, names

6.  Unix

    1. Why did we read this paper?

                                              i.     You all know Unix a bit, get it out of the way

                                             ii.     Provides a lens to look at other papers

                                           iii.     Important context of what OS looks like today

    1. Comments from reviews:

                                              i.     Feedback:

1.   Often too long – donŐt need to write so much!

                                             ii.     Problem:

1.   Unclear what problem they were solving.

                                           iii.     Contributions

1.   Be concrete – not just that the artifact was lasting. What about ideas?

                                           iv.     Flaws

1.   No evaluation, Not much motivation

a.    Often true for industry projects

2.   Why no hard links to directories? Why no file locking?

a.    hard to get right, can  to work around – emblematic of Unix approach

b.   Circular structures canŐt be garbage collected with reference counts

3.   Why not worry about quotas?

a.    Hard to get right

b.   Not needed in their environment

4.   They donŐt address the requirements / computing environment

a.    When written, everybody knew about it. People today donŐt write about what a PC is or how much it costs (very much).

5.   Using C made the OS bigger.

a.    QUESTION: comments?

    1. Context:

                                              i.     Writers worked on Multics for Bell lab – reacted to gross complexity & inefficiency of Multics

                                             ii.     Had very small computer to work with, wanted to use for their own purposes

1.   Different from creating something for others; you know what you need and what you can sacrifice

2.   Has to be usable; often drives out other goals such as abstraction

                                           iii.     Commercial OS at the time not extensible – you just got what you got and lived with it.

    1. Theme of the paper: what should the kernel interface be? What are the key abstractions a kernel should provide to a programmer / program?

                                              i.     Everything (amost) is a file

1.   Device access

2.   Interprocess communication

3.   Directories

4.   QUESTION: Why important?

a.    DonŐt need lots of APIs

b.   Can have tools that operate on different things.

                                                                                                    i.     e.g. cat to a device

5.   QUESTION: how do you provide shared access to a device, e.g. a printer?

a.    Grant exclusive access to a daemon, let it do sharing.

                                             ii.     Data is bytes

1.   not records

2.   Generally null-terminated strings

3.   QUESTION: Why important?

a.    A record format is hard to program to

b.   Text format commonly recognized

                                           iii.     Uniform name space

1.   no separate naming convention for directories , file names, different disks

a.    d:\foo (dos, windows)

b.   e.g. $pinot:sys$disk[swift.one]foo.doc;13 (VMS)

2.   Mountable file systems into name space

a.    but not quite transparent – wasnŐt worth the complexity for linking, mvŐing across mount points.

3.   Separate name from contents

a.    Name refers to an i-node number, not to a file directly

b.   Expose implementation through links

4.   QUESTION: why important?

a.    Simplify name parsing in programs

b.   Easy model to navigate from any one place to another (allows relative paths up and down)

 

                                           iv.     Images

1.   Address space + kernel data structure + file descriptor table

2.   System calls make it easy to spawn

                                             v.     Limited communication / synchronization mechanisms

1.   Fork / wait

2.   Pipes with child

                                           vi.     System shell exposing underlying kernel featuers

1.   Fork / wait parallelism

2.   Pipelines

3.   Coroutine programming using messages / forks

                                          vii.     User IDs + root

1.   Simple two level model

2.   SetUID to amplify rights

3.   QUESTION: why important?

a.    Hard to get semi-privileged things right

b.   Setuid makes it easy to have privileged subsystems as programs, e.g. login, passwd

c.    Compare to Windows: no setuid – need trusted launcher or running process for trusted subsystem

d.   No need for a separate mechanism for users to create their own trusted subsystems separate from the system (not possible on Windows easily)

                                        viii.     Summary:

1.   Avoid problems that are hard to get right or require a lot of mechanism; e.g. hard links to directories, file usage quotas, moving files (or linking file) across mount points

2.   DonŐt hide underlying mechanisms if they are useful

a.    fork (easy to do based on context switching)

b.   hard links (easy to do based on directory structure)

e.    OS implementation

                                              i.     OS structure proposed by Unix

1.   Two levels: kernel and user

2.   Simple kernel for extensibility

3.   Services implemented as setuid programs that run on demand

                                             ii.     File system

1.   Layer of indirection between name and file – the inode

2.   Metadata (but not name) stored in inode on file, not in directory

a.    NOTE: is a layer of indirection between directory and file

b.   Allows linking: one set of metadata

c.    Slow to do ls –l

d.   Makes charging hard – who pays, the directory owner or the file creator?

3.   No file locking/ synchronization

a.    QUESTION: What is the assumption here? Not much sharing

b.   What can you do for safe updates?

                                                                                                    i.     Make a copy & then rename

                                                                                                   ii.     Have application-specific lock files (e.g. Emacs)

c.    EXAMPLE OF Unix approach – keep kernel simple, make applications handle things

4.   I/O APIs make all I/O look synchronous, unbuffered

a.    Relies on caching, write-behind in kernel for performance

b.   No different APIs for sequential & random access – just seek (SUGGESTED by Multics)

c.    All synchronous

                                                                                                    i.     QUESTION: is/was this a problem?

1.   O.k. for small # of streams, but problem for networking on a server

5.   Directories are also files, but can only be written by root

a.    Directory entries contain name and inode number

b.   Inode contains protection information, file statistics, reference count

                                                                                                    i.     QUESTION: what is result? Same access independent of path to file

c.    Can only link to files on same disk

                                                                                                    i.     QUESTION: what problems? Transparency; disk boundaries arbitrary but visible to user (e.g. mv command)

d.   Can check directory for consistency by looking at dir entries, inodes, blocks

e.    QUESTION: what about performance?

                                           iii.     Communication

1.   QUESTION: How did unix do IPC? Why/ why not?

2.   In general, no arbitrary communication between processes – no shared memory or messages or semaphores

3.   Can use environment variables between processes

4.   Can use shared files (but no locking!)

5.   Can use signals to interrupt other processes

6.   Related processes can use pipes, just like files

a.    Combined with text data format, allows small programs to be combined into larger programs

b.   No special communication api except pipe

c.    Previous invented at dartmouth, but not used

d.   Specialized form of a co-routine

                                                                                                    i.     E.g. subroutine that does some work then yields and lets another run to do part of the work

e.    ONLY MECHANISM FOR SYNCHRONIZATION (waiting for others) other than wait() for exit()

7.   Redirection

a.    Less focus on interactive vs. batch – one program can do both

                                           iv.     Process control: Fork / exec

1.   QUESTION: why fork/exec?

a.    History: had to swap out old process to run new process

b.   Fork == leave copy in memory

c.    Originally 27 lines of assembly

2.   QUESTION: What is benefit?

a.    Compare to CreateProcess (9 parameters)

                                                                                                    i.     Application name

                                                                                                   ii.     Command line

                                                                                                 iii.     Process ACL

                                                                                                 iv.     Thread ACL for first thread

                                                                                                   v.     Inherit Handle flag

                                                                                                 vi.     Creation Flags (11 flags regarding how processes are grouped)

                                                                                                vii.     Environment pointer – environment variables

                                                                                              viii.     Current directory string

                                                                                                 ix.     Startup info – 18 parameters

1.   Windows size

2.   Stdin, stdout, stderr andles

3.   Desktop to create on

4.   9 flag values

                                                                                                  x.     Process information

1.   Output value containing handles to new thread, process, process id, thread id

b.   Fork allows you to control new process by running code before running exec

                                                                                                    i.     You only need to control yourself (e.g. close/open files, set environment) on Unix; on windows you need to control others

c.    General dichotomy:

                                                                                                    i.     Provide a hook to inject code to do whatever you want (Unix)

                                                                                                   ii.     Provide configuration options anticipating all possible needs (Windows)

d.   General approach to doing things:

                                                                                                    i.     Unix: provide some code, e.g. shell script, or forked code, to set things up

                                                                                                   ii.     Windows: statically declare properties as parameters or name-value pairs (e.g. windows)

e.    Differences?

                                                                                                    i.     Windows allows more reasoning / control over what happens

                                                                                                   ii.     Unix allows more flexibility, compact representation – donŐt need to create flags to decide everything

                                             v.     Protection: ACLs

1.   Owner checked first, then group, then everyone

a.    Allows denying a group – stops after first match

b.   QUESTION: How much flexibility does this add?

                                                                                                    i.     A lot. Can make a group to give special access

                                                                                                   ii.     Not much – can only give special access to one group

                                                                                                 iii.     How limiting is this? How often do you want different access for user, two groups, and everyone else?

2.   Owner has special rights to change ACLs

3.   CanŐt give ownership away

4.   Where is ACL/user/group stored – on directory or on inode? Should be in inode

5.   Single superuse: ROOT IS EXCLUDED

a.    QUESTION: Is this a problem? What about sharing administrations tasks?

b.   A: use setuid programs & execute permission

6.   Setuid

a.    Like entry capabilities, templates

b.   Issues:

                                                                                                    i.     Who should be able to use it?

                                                                                                   ii.     How should it be controlled?

                                                                                                 iii.     How can it be misused? What if you put it on the wrong program?

1.   E.g. get administrator to run a file which puts SETUID on your file

                                                                                                 iv.     Need to ensure program canŐt be subverted- e.g. crashed at wrong time, verifies all inputs

7.   QUESTION: Is this enough? What can it canŐt it do?

 

                                           vi.     Shell

1.   Previous systems: one process/user

a.    Would replace shell to run command.

b.   Exit command would relaunch shell

2.   Runs commands, searches default path

a.    Problem if path includes local directory

3.   Simple extension / unification with underlying kernel primitives

a.    E.g. & ˆ no wait called, ; ˆ wait called

4.   I/O redirection allows batch operation or interaction of user programs w/o their involvement

5.   Previous systems: one process per terminal; shell exits to run program and then restarts

6.   Big ideas: commands as binary operators, taking an input and producing an output.

a.    Allows executing them in sequence with pipes

b.   Allows redirecting them for intermediate storage, batching

c.    COMMENT: at time, one-input and one-output seemed too confining

 

    1. Comments / questions

 

Paper 2:

The structure of  the ŇTHEÓ-Multiprogramming System.

 

 

  1. Stuff from reviews
    1. Problem solved

                                               i.     Many people quoted paper – smoothely processing a stream of programs. What was the real problem?

    1. Contributions

                                               i.     People read a lot into what was said

                                             ii.     E.g. preemption in scheduling. This was never stated – just context switching

                                            iii.     People missed a big idea - layering

    1. Flaws

                                               i.     DidnŐt solve deadlock. Is this a flaw or a limitation?

                                             ii.     Paper is not just a technical description, but an experience report – random bits of information. Is this information useful to anyone? If so, how should it be dispensed?

 

  1. QUESTION: What was this paper about?
    1. QUESTION: What problems were they trying to solve?

                                               i.     A: How do you build an OS?

                                             ii.     A: What are the right abstractions / organization for an OS?

  1. QUESTION: What is their proposed structure?
    1. Layers to virtualize different aspects of the hardware
    2. QUESTION: why do you need to virtualize hardware?

                                               i.     hard to handle interrupts: save, restore information, make sure you donŐt access things in an interrupt you shouldnŐt

                                             ii.     hard to manage memory manually (but people did it!)

                                            iii.     Each layer gets to run on a virtual machine that removes one element of hardware and replaces it with a software abstraction

  1. System structure:
    1. Layers: (DRAW ON BOARD – ask class to give me the layers and descriptions)

                                               i.     QUESTION: why layers?

1.     Easy to reason about – you can only communicate with layers above/below

2.     Logical: can provide an abstract machine to higher levels

3.     Problem: what goes in the layer, what order?

4.     NOTE: can also consider these as abstract modules, organized for now into layers

                                             ii.     0 = processes and context switching

1.     Processor:

a.     System is a set of sequential processes at undefined speed ratios, use semaphores for synchronization

                                                                                                     i.     Delaying a process canŐt affect its correctness

                                                                                                    ii.     QUESTION: is this a realistic model? Is it limited? What does it rule out? (e.g. unprotected reading/writing)

b.     Q: What does this mean? How does it help?

c.     A: Data doesnŐt disappear if you donŐt pick it up: buffering/blocking input and output

                                                                                                     i.     We take this for granted

                                                                                                   ii.     Previously, had to time code so that picked up data before next piece arrived; very sensitive to changes in timing of HW.

d.     Impact: no synchronization that is not explicit, via timing. You always wait for things to happen

2.     Process real-time clock interrupts

3.     Hide multiple processors (should they exist!)

4.     Handle processor allocation / context switching

5.     Provides: virtual machine for sequential processes

6.     Supported 5 user processes, 10 i/o processes

a.     System processes structured as cyclic processes (producer/consumer):

                                                                                                     i.     wait for input

                                                                                                   ii.     compute

                                                                                                  iii.     produce output

b.     To communicate:

                                                                                                     i.     Provide input

                                                                                                   ii.     wait for output

                                            iii.     1 = virtual memory – Ňsegment controllerÓ

1.     Memory:

a.     Pages = unit of moving memory

b.     Segments = unit of information

c.     Separate, large segment address space

d.     Segment variable in the core identifies whether segment is in core or on drum

                                                                                                     i.     Addresses independent of memory or disk address; gives flexibility

                                                                                                   ii.     Not need consecutive allocation

e.     Principle here: virtualizing a resource provides flexibility, hides details from upper levels

                                                                                                     i.     Is key technique – adding a layer of indirection for flexibility, scheduling, poilcyŒ

2.     synchronized access to drum/disk

3.     provides virtual address space, automatic swapping

4.     provides virtual machine with large virtual address space

                                            iv.     2 = virtual console

1.     Handles connection of console keyboard to a process

2.     Requires naming of process to be communicated with- on a ŇconversationÓ basis

3.     Does message routing based on name of conversation,

4.     Provides virtual private console (/dev/tty) to next level

5.     Can be swapped out, because above segment controller layer

                                              v.     3 = virtual devices

1.     I/O devices are also sequential devices with synchronization

a.     Hides timing details as much as possible

2.     I/O devices abstracted as buffered input and unbuffered output streams.

3.     I/O devices presented as two sequential processes (buffered input, so can read asynchronous) and unbuffered output

4.     Above message interpreter so can send error messages to operator (e.g. load tape)

                                            vi.     4 = user programs

                                           vii.     5 = operator?

                                         viii.     QUESTION:

1.     Where would networking go? What if you want to swap segments over a network?

2.     Is this the only layers? What could you invert? E.g. move virtual devices under segment controller

3.     Could you invert these layers? E.g. put virtual devices under virtual console?

4.     Segment controller canŐt use virtual devices.

  1. QUESTION: what is the interface to programs?
    1. Procedures that provide access to low-level resoureces
    2. Coroutines: do something, wait for someone to reply or signal someone to wake up
  2. QUESTION: what are some key ideas/contributions?
    1. Key idea: organize system into layers

                                               i.     Each layer hides lower layer from upper layer

                                              ii.     Reduces potential interactions of n layers of m components from (nm)2 to n*(m2)

    1. Key idea: provable correctness / abstract reasoning

                                               i.     Using semaphores for mutual exclusion

1.     Allows reasoning about concurrency

2.     Compared to other structures, allows great flexibility in synchronization

a.     E.g. synchronous programming with timers for poling

    1. Key idea: Component testing

                                               i.     Test all possible combinations of inputs to a layer to verify correctness

    1. Key idea: virtual memory

                                               i.     Addresses truly virtual, not physical for disk or memory. Allows flexibility

    1. Key idea: design for debugging

                                               i.     Multi programmed, reentrant code hard to reason about.

                                             ii.     Program verifiers today canŐt really handle it, even

    1. Key idea: sequential processes

                                               i.     Timing doesnŐt matter

1.      

  1. Design Experience
    1. QUESTION: what did you think of the design experience. Is it reasonable today? Do you believe the claims?
    2. Lots of time spent in design stage
    3. Verification stage: done by forcing system into all possible states per layer, starting at layer 0 and working up
    4. Could forget details of lower layer at this level because of sequential assumption (e.g. no re-entrancy/concurrency in a process)
    5. QUESTION: goal was provable system? Is this reasonable?

                                               i.     Is done today – sw firm in UK codes in a language that provides verification

                                             ii.     Several formally verified OS exist today

                                            iii.     Key: requires design for verification, not design then verification

  1. Notes:
    1. Code written to allow deductive reasoning/proof
  2. Evaluation? None
    1. What would it mean if he showed raw numbers? What could you compare them against? Nobody else was using the same computer, there was no other OS.
  3. Problems:
    1. QUESTION: who can give me some problems?
    2. Flexibility:

                                               i.     hard to move functionality between layers

                                             ii.     Hard to add new functionality: what layer does a network go in? What if you want to do swapping over it?

    1. Protection:

                                               i.     No mention of it, how to track it between layers.

                                             ii.     Semaphore construct requires ŇharmoniousÓ cooperation; not realistic

    1. Performance:

                                               i.     Hard to optimize across layers if data is hidden.

    1. ŇResulting system is guaranteed to be flawlessÓ – perhaps optimistic?
  1. Relevance today
    1. QUESTION: is it relevant? Is it used? Where?
    2. Dependency layers still exit. Windows XP has something like 30 layers if you look at dependence chains of device drivers or DLLs
    3. Networks still use layers, but compressing layers or combining layers is common