Topic:
Complexity
i. Assembly language (PL/1
coming along)
ii. Not much agreement about
abstractions
iii. Not much rigor/correctness
given to design
i. OS/360 was being designed
– huge effort, thousands of programmers, late, buggy, not rigorously
designed (e.g. corner cases involving interrupts very sloppy)
i. What services should it
offer?
ii. What applications is it
for?
iii. How should it be
constructed internally?
1.
As
a bunch of libraries?
2.
As
a bunch of a layers
3.
As
a hierarchy?
4.
As
modules/subsystems?
i. What is right way to
organize OS to provide
1.
Protection
2.
Flexibility
3.
Simplicity
/ correctness
4.
Handle
I/O efficiently (abstracted from processes)
ii. How do you battle the
complexity of:
1.
Multiprogramming
– e.g. different users, different tasks, different programs, different
priorities
2.
Interrupts;
re-entrant code
3.
Control:
who controls things and how
4.
Flexibility:
not much is known about how to do things, want to have flexibility to change
things in the future
iii. What are the right abstractions to provide?
1. e.g. processes, threads, messages, files,
names
i. You all know Unix a bit, get it out of the
way
ii. Provides a lens to look at other papers
iii. Important context of what OS looks like
today
i. Feedback:
1. Often too long – donŐt need to write
so much!
ii. Problem:
1. Unclear what problem they were solving.
iii. Contributions
1. Be concrete – not just that the
artifact was lasting. What about ideas?
iv. Flaws
1. No evaluation, Not much motivation
a. Often true for industry projects
2. Why no hard links to directories? Why no
file locking?
a. hard to get right, can to work around – emblematic of
Unix approach
b. Circular structures canŐt be garbage
collected with reference counts
3. Why not worry about quotas?
a. Hard to get right
b. Not needed in their environment
4. They donŐt address the requirements /
computing environment
a. When written, everybody knew about it.
People today donŐt write about what a PC is or how much it costs (very much).
5. Using C made the OS bigger.
a. QUESTION: comments?
i. Writers worked on Multics for Bell lab
– reacted to gross complexity & inefficiency of Multics
ii. Had very small computer to work with,
wanted to use for their own purposes
1. Different from creating something for
others; you know what you need and what you can sacrifice
2. Has to be usable; often drives out other
goals such as abstraction
iii. Commercial OS at the time not extensible
– you just got what you got and lived with it.
i. Everything (amost) is a file
1. Device access
2. Interprocess communication
3. Directories
4. QUESTION: Why important?
a. DonŐt need lots of APIs
b. Can have tools that operate on different
things.
i. e.g. cat to a device
5. QUESTION: how do you provide shared access
to a device, e.g. a printer?
a. Grant exclusive access to a daemon, let it
do sharing.
ii. Data is bytes
1. not records
2. Generally null-terminated strings
3. QUESTION: Why important?
a. A record format is hard to program to
b. Text format commonly recognized
iii. Uniform name space
1. no separate naming convention for
directories , file names, different disks
a. d:\foo (dos, windows)
b. e.g. $pinot:sys$disk[swift.one]foo.doc;13
(VMS)
2. Mountable file systems into name space
a. but not quite transparent – wasnŐt
worth the complexity for linking, mvŐing across mount points.
3. Separate name from contents
a. Name refers to an i-node number, not to a
file directly
b. Expose implementation through links
4. QUESTION: why important?
a. Simplify name parsing in programs
b. Easy model to navigate from any one place
to another (allows relative paths up and down)
iv. Images
1. Address space + kernel data structure +
file descriptor table
2. System calls make it easy to spawn
v. Limited communication / synchronization
mechanisms
1. Fork / wait
2. Pipes with child
vi. System shell exposing underlying kernel
featuers
1. Fork / wait parallelism
2. Pipelines
3. Coroutine programming using messages /
forks
vii. User IDs + root
1. Simple two level model
2. SetUID to amplify rights
3. QUESTION: why important?
a. Hard to get semi-privileged things right
b. Setuid makes it easy to have privileged
subsystems as programs, e.g. login, passwd
c. Compare to Windows: no setuid – need
trusted launcher or running process for trusted subsystem
d. No need for a separate mechanism for users
to create their own trusted subsystems separate from the system (not possible
on Windows easily)
viii.
Summary:
1. Avoid problems that are hard to get right
or require a lot of mechanism; e.g. hard links to directories, file usage
quotas, moving files (or linking file) across mount points
2. DonŐt hide underlying mechanisms if they
are useful
a. fork (easy to do based on context
switching)
b. hard links (easy to do based on directory
structure)
i. OS structure proposed by Unix
1. Two levels: kernel and user
2. Simple kernel for extensibility
3. Services implemented as setuid programs
that run on demand
ii. File system
1. Layer of indirection between name and file
– the inode
2. Metadata (but not name) stored in inode on
file, not in directory
a. NOTE: is a layer of indirection between
directory and file
b. Allows linking: one set of metadata
c. Slow to do ls –l
d. Makes charging hard – who pays, the
directory owner or the file creator?
3. No file locking/ synchronization
a. QUESTION: What is the assumption here? Not
much sharing
b. What can you do for safe updates?
i. Make a copy & then rename
ii. Have application-specific lock files (e.g.
Emacs)
c. EXAMPLE OF Unix approach – keep
kernel simple, make applications handle things
4. I/O APIs make all I/O look synchronous,
unbuffered
a. Relies on caching, write-behind in kernel
for performance
b. No different APIs for sequential &
random access – just seek (SUGGESTED by Multics)
c. All synchronous
i. QUESTION: is/was this a problem?
1. O.k. for small # of streams, but problem
for networking on a server
5. Directories are also files, but can only be
written by root
a. Directory entries contain name and inode
number
b. Inode contains protection information, file
statistics, reference count
i. QUESTION: what is result? Same access
independent of path to file
c. Can only link to files on same disk
i. QUESTION: what problems? Transparency; disk
boundaries arbitrary but visible to user (e.g. mv command)
d. Can check directory for consistency by
looking at dir entries, inodes, blocks
e. QUESTION: what about performance?
iii. Communication
1. QUESTION: How did unix do IPC? Why/ why
not?
2. In general, no arbitrary communication
between processes – no shared memory or messages or semaphores
3. Can use environment variables between
processes
4. Can use shared files (but no locking!)
5. Can use signals to interrupt other
processes
6. Related processes can use pipes, just like
files
a. Combined with text data format, allows
small programs to be combined into larger programs
b. No special communication api except pipe
c. Previous invented at dartmouth, but not
used
d. Specialized form of a co-routine
i. E.g. subroutine that does some work then
yields and lets another run to do part of the work
e. ONLY MECHANISM FOR SYNCHRONIZATION (waiting
for others) other than wait() for exit()
7. Redirection
a. Less focus on interactive vs. batch –
one program can do both
iv. Process control: Fork / exec
1. QUESTION: why fork/exec?
a. History: had to swap out old process to run
new process
b. Fork == leave copy in memory
c. Originally 27 lines of assembly
2. QUESTION: What is benefit?
a. Compare to CreateProcess (9 parameters)
i. Application name
ii. Command line
iii. Process ACL
iv. Thread ACL for first thread
v. Inherit Handle flag
vi. Creation Flags (11 flags regarding how
processes are grouped)
vii. Environment pointer – environment
variables
viii.
Current
directory string
ix. Startup info – 18 parameters
1. Windows size
2. Stdin, stdout, stderr andles
3. Desktop to create on
4. 9 flag values
x. Process information
1. Output value containing handles to new
thread, process, process id, thread id
b. Fork allows you to control new process by
running code before running exec
i. You only need to control yourself (e.g.
close/open files, set environment) on Unix; on windows you need to control
others
c. General dichotomy:
i. Provide a hook to inject code to do
whatever you want (Unix)
ii. Provide configuration options anticipating
all possible needs (Windows)
d. General approach to doing things:
i. Unix: provide some code, e.g. shell script,
or forked code, to set things up
ii. Windows: statically declare properties as
parameters or name-value pairs (e.g. windows)
e. Differences?
i. Windows allows more reasoning / control
over what happens
ii. Unix allows more flexibility, compact
representation – donŐt need to create flags to decide everything
v. Protection: ACLs
1. Owner checked first, then group, then
everyone
a. Allows denying a group – stops after
first match
b. QUESTION: How much flexibility does this
add?
i. A lot. Can make a group to give special
access
ii. Not much – can only give special
access to one group
iii. How limiting is this? How often do you want
different access for user, two groups, and everyone else?
2. Owner has special rights to change ACLs
3. CanŐt give ownership away
4. Where is ACL/user/group stored – on
directory or on inode? Should be in inode
5. Single superuse: ROOT IS EXCLUDED
a. QUESTION: Is this a problem? What about
sharing administrations tasks?
b. A: use setuid programs & execute
permission
6. Setuid
a. Like entry capabilities, templates
b. Issues:
i. Who should be able to use it?
ii. How should it be controlled?
iii. How can it be misused? What if you put it
on the wrong program?
1. E.g. get administrator to run a file which
puts SETUID on your file
iv. Need to ensure program canŐt be subverted-
e.g. crashed at wrong time, verifies all inputs
7. QUESTION: Is this enough? What can it canŐt
it do?
vi. Shell
1. Previous systems: one process/user
a. Would replace shell to run command.
b. Exit command would relaunch shell
2. Runs commands, searches default path
a. Problem if path includes local directory
3. Simple extension / unification with
underlying kernel primitives
a. E.g. & no wait called, ; wait called
4. I/O redirection allows batch operation or
interaction of user programs w/o their involvement
5. Previous systems: one process per terminal;
shell exits to run program and then restarts
6. Big ideas: commands as binary operators,
taking an input and producing an output.
a. Allows executing them in sequence with
pipes
b. Allows redirecting them for intermediate
storage, batching
c. COMMENT: at time, one-input and one-output
seemed too confining
Paper
2:
i. Many people quoted paper
– smoothely processing a stream of programs. What was the real problem?
i. People read a lot into
what was said
ii. E.g. preemption in
scheduling. This was never stated – just context switching
iii. People missed a big idea -
layering
i. DidnŐt solve deadlock. Is
this a flaw or a limitation?
ii. Paper is not just a
technical description, but an experience report – random bits of
information. Is this information useful to anyone? If so, how should it be
dispensed?
i. A: How do you build an OS?
ii. A: What are the right
abstractions / organization for an OS?
i. hard to handle interrupts:
save, restore information, make sure you donŐt access things in an interrupt
you shouldnŐt
ii. hard to manage memory
manually (but people did it!)
iii. Each layer gets to run on
a virtual machine that removes one element of hardware and replaces it with a
software abstraction
i. QUESTION: why layers?
1.
Easy
to reason about – you can only communicate with layers above/below
2.
Logical:
can provide an abstract machine to higher levels
3.
Problem:
what goes in the layer, what order?
4.
NOTE:
can also consider these as abstract modules, organized for now into layers
ii. 0 = processes and context
switching
1.
Processor:
a.
System
is a set of sequential processes at undefined speed ratios, use semaphores for
synchronization
i. Delaying a process canŐt
affect its correctness
ii. QUESTION: is this a
realistic model? Is it limited? What does it rule out? (e.g. unprotected
reading/writing)
b.
Q:
What does this mean? How does it help?
c.
A:
Data doesnŐt disappear if you donŐt pick it up: buffering/blocking input and
output
i. We take this for granted
ii. Previously, had to time
code so that picked up data before next piece arrived; very sensitive to
changes in timing of HW.
d.
Impact:
no synchronization that is not explicit, via timing. You always wait for things
to happen
2.
Process
real-time clock interrupts
3.
Hide
multiple processors (should they exist!)
4.
Handle
processor allocation / context switching
5.
Provides:
virtual machine for sequential processes
6.
Supported
5 user processes, 10 i/o processes
a.
System
processes structured as cyclic processes (producer/consumer):
i. wait for input
ii. compute
iii. produce output
b.
To
communicate:
i. Provide input
ii. wait for output
iii. 1 = virtual memory –
Ňsegment controllerÓ
1.
Memory:
a.
Pages
= unit of moving memory
b.
Segments
= unit of information
c.
Separate,
large segment address space
d.
Segment
variable in the core identifies whether segment is in core or on drum
i. Addresses independent of
memory or disk address; gives flexibility
ii. Not need consecutive
allocation
e.
Principle
here: virtualizing a resource provides flexibility, hides details from upper
levels
i. Is key technique
– adding a layer of indirection for flexibility, scheduling, poilcy
2.
synchronized
access to drum/disk
3.
provides
virtual address space, automatic swapping
4.
provides
virtual machine with large virtual address space
iv. 2 = virtual console
1.
Handles
connection of console keyboard to a process
2.
Requires
naming of process to be communicated with- on a ŇconversationÓ basis
3.
Does
message routing based on name of conversation,
4.
Provides
virtual private console (/dev/tty) to next level
5.
Can
be swapped out, because above segment controller layer
v. 3 = virtual devices
1.
I/O
devices are also sequential devices with synchronization
a.
Hides
timing details as much as possible
2.
I/O
devices abstracted as buffered input and unbuffered output streams.
3.
I/O
devices presented as two sequential processes (buffered input, so can read
asynchronous) and unbuffered output
4.
Above
message interpreter so can send error messages to operator (e.g. load tape)
vi. 4 = user programs
vii. 5 = operator?
viii.
QUESTION:
1.
Where
would networking go? What if you want to swap segments over a network?
2.
Is
this the only layers? What could you invert? E.g. move virtual devices under
segment controller
3.
Could
you invert these layers? E.g. put virtual devices under virtual console?
4.
Segment
controller canŐt use virtual devices.
i. Each layer hides lower
layer from upper layer
ii. Reduces potential
interactions of n layers of m components from (nm)2 to n*(m2)
i. Using semaphores for
mutual exclusion
1.
Allows
reasoning about concurrency
2.
Compared
to other structures, allows great flexibility in synchronization
a.
E.g.
synchronous programming with timers for poling
i. Test all possible
combinations of inputs to a layer to verify correctness
i. Addresses truly virtual,
not physical for disk or memory. Allows flexibility
i. Multi programmed,
reentrant code hard to reason about.
ii. Program verifiers today
canŐt really handle it, even
i. Timing doesnŐt matter
1.
i. Is done today – sw
firm in UK codes in a language that provides verification
ii. Several formally verified
OS exist today
iii. Key: requires design for
verification, not design then verification
i. hard to move functionality
between layers
ii. Hard to add new
functionality: what layer does a network go in? What if you want to do swapping
over it?
i. No mention of it, how to
track it between layers.
ii. Semaphore construct
requires ŇharmoniousÓ cooperation; not realistic
i. Hard to optimize across
layers if data is hidden.