CS 736 – Spring 2006

AFS and NFS

Questions from reviews:

NFS:

i. Lack of security?

1. How real is this?

ii. Use of RPC?

1. Read a block at a time (8kb in a UDP packet)

2. Goal: easy to port

iii. Real contribution was vnode/vfs?

iv. Locking? Could two clients create locks at the same time?

1. Have separate lock server, not use lock files

i. DBMS not work? later said not a problem – not a design goal

ii. Servers must keep state – could degrade perf? But state allows avoidance of work …

iii. Complexity of callbacks – what happens on failure?

iv. Confusion: client cache is on disk, not in memory

v. What happens if apps run on server?

1. A: Explicitly state that file servers are dedicated machines, don’t run apps

Unix semantics

Any change to a file or file system visible to next operation (e.g. read returns data just written)
The last-close semantic is the semantic which requires that an open file remain available to any process which has the file open regardless of any changes in file or process characteristics which may take place after the file is opened. It is called the last-close semantic because the best known consequence of the last-close semantic is that when a file is deleted, the file is not removed until the file is closed by the last process which has it open.

Goals

Network file system
Scale – how big?

i. Large number of clients

ii. Client performance not as important

iii. Central store for shared data, not diskless workstations

Consistency

i. Some model you can program against

Reliability

i. Need to handle client & server failures

Naming

i. Want global name space, not per-machine name space

1. compare to NFS, CIFS

2. Gain: transparency if file moved

Implemented on RPC / XDR for data format conversion

i. QUESTION: Why? Is it necessary? Easier to port / heterogeneous

ii. Can use RPC-level security solutions

Machine & OS independent
Crash recovery
Transparent to clients

i. QUESTION: What happens on network failures?

ii.

Designed for disk-less clients
Stateless protocol

i. QUESTION: Why? Easy recovery from failure

ii. No information retained across RPC invocations

iii. Easy crash recovery – just restart server, client resends RPC request until server comes back

iv. Server doesn’t need to detect client failure

v. Problem: what if client retries a non-idempotent operation

1. E.g. remove?

Datagram protocol

i. Uses UDP with large (8kb) messages

Servers don’t do name operations

i. Clients work with File Handles – like AFS FID, based on inode number

1. Has inode number + generation num ber + FS id == handle

2. Generation used to allow inodes to be reused

3. FS ID

ii. Where do fh’s come from?

1. Lookup

2. Create

3. Mkdir

4. MOUNT protocol – totally separate, uses unix path names to request access to local path on server

a. Early binding : at mount time

b. Host names translated to IP addresses here

5. Mount types

a. Hard: client retries forever

i. QUESTION: When use? For client apps that don’t do error checking

b. Soft: client gives up, returns error

Naming

i. Servers can export any directory (like Windows sharing)

1. Only exports a single file system – doesn’t cross mount points

ii. Clients mount anywhere in name space

iii. Each client can mount files in a different place

iv. QUESTION: What are benefits / drawbacks?

v. QUESTION: How handle cycles?

1. A: NFS servers won’t serve files across mount points

2. Clients must mount next file system below in the FS hierarchy

NFS file semantics

i. Clients cache data for 30 seconds

ii. Clients can used cached data for 30 seconds without checking with server

iii. Servers must write data to disk before returning (no delayed writing)

1. QUESTION: What are performance implications?

iv. Attribute cache for file attributes – kept for 3 seconds

1. Used to see if attributes have changed

2. Discarded after 60 seconds

v. Version 3 adds session-consistency: block on close, revalidate data on open (implemented but not promised)

Security

i. Client sends user ID, group IDs to server

ii. Requires global name space of IDs – need to service YellowPages to implement this

iii. Server must trust client OS

iv. QUESTION: What about root on client OS? à map to nobody on server (same in Windows)

Caching

i. Servers cache blocks

ii. Clients cache blocks, metadata

1. Attribute cache has metadata, checked on open, discarded after 60 seconds

2. Data cache flushed after 30 seconds

iii. Clients consult with server

iv. Issues:

1. Unix files are removed until last handle closed

a. On stateless server, causes file to be deleted while still in use

b. Solution in NFS: rename file, remove on local close

2. Permissions may change on open files

a. Unix allows access if have open handles

b. NFS may deny access

i. Solution: Save client credentials at open time, use to check access later

ii. NOTE: server doesn’t do enforcement here

QUESTION: What happens when you open / read / write a file?

i. Open: client checks with remote sever to fetch or revalidate cached inode (if older than 30 seconds)

ii. Reads handled locally, writes written back after 30 seconds

iii. Nothing happens on close

iv. Data flushed after 30 seconds – may not be seen by other clients for another 30 seconds

v. Write: delayed on client for 30 seconds, then written synchronously to server

Design essence

i. Stateless server for easy crash recovery (keep system simple!)

ii. Relax consistency (no guarantees) to get better performance

iii. Pure client server; no distributed servers

Notes: NFS v4

i. synchronous writes for durability hurts performance

ii. return both dir + inode attributes with READDIRPLUS

iii. is stateful:

1. open / close calls needed for exclusive access compatibility with windows

2. Stronger cache consistency – calls now return a change_info structure that indicate whether directory changed during the operation – see if has to flush the directory

3. Supports leases / callbacks when single file accesses file

iv. Removes separate mount protocl

v. Removes separate lock protocol

vi. Uses tcp/ip (can traverse fire walls)

vii. Support compound RPCs – combine multiple operations together

viii. Server can export a tree with multiple file systems

ix. Can lookup full pathnames, not just components

AFS version 1:

Process per client
Name lookups on server
Cache validation with callback on access
Result:

i. Low scalability: performance got a lot worse (on clients) when # of clients goes up

ii. QUESTION: what was bottleneck?

1. Server disk? Seek time ? disk BW?

2. Server CPU?

3. Network?

4. Client CPU/Disk?

Evaluation performance: Andrew Benchmark

i. Used by many others

ii. QUESTION: What does it represent?

1. A: nothing.

2. Has a mix of workloads, can see how they respond

iii. Pieces:

1. Make dir – create directory tree: stresses metadata

2. Copy – copy in files – stresses file writes / creates

3. Scan Dir (like ls –R) – stresses metadata reads

4. ReadAll – find . | wc – stresses whole file reads

5. Make – may be CPU bound, does lots of reads + fewer writes

iv. QUESTION: What is missing?

1. All pieces do whole-file reads / writes

2. Missing productivity applications, scientific applications

v. QUESTION: they use a different platform for prototype and final version. is this relevant?

1. A: the prototype evaluation is to show where bottlenecks are

2. A: evaluation of final one shows what bottlenecks remain, compare against other systems

AFS v2

CONTEXT: designed for systems with local disks
QUESTION: What is the goal?

i. Local-file Latency?

ii. Local-file Throughput?

iii. Server throughput?

iv. Server latency?

Transparent to clients – match Unix naming / Whole file caching

i. QUESTION: Why not partial files?

ii. Usage study shows most files accessed in entirety

iii. Simplifies protocol / consistency

iv. Read / write handled completely locally

v. QUESTION: What workload is this optimized for

Local disk cache

i. QUESTION: Why? Increases latency (local disk access)

ii. Reduces load on server by having a larger client cache

Relaxed but well-defined consistency semantics

i. Get latest value on open

ii. Changes visible on close

1. Write-through to the server (minimizes server inconsistency, but increases load compared to write-back when evicted)

iii. Read/write purely local – get local unix semantics

1. programs not location-transparent

iv. Metadata is global synchronous

v. QUESTION: different from Unix. Is it a problem? When?

Global name space: /afs

i. Names are same on all clients

ii. Can move volumes between servers, nothing changes

Performance improvements

Call backs

i. Server notifies client if file changes

ii. QUESTION: Why?

1. Reduces load on server- no client polling

iii. How bad is it for the server?

1. QUESTION: how much state does the server manage?

a. Call back per file cached

2. QUESTION: What can the server do to reduce this?

a. Limit # of callbacks

iv. What happens on failure?

1. After client failure, clients re-establish all callbacks

2. After server failure, …

v. Batching requests for performance

1. Can release lots of callbacks at once

Name resolution

i. Id-based names

1. Servers never do path lookups

a. PRINCIPLE: make clients do the work – use the CPU cycles

ii. 2-level name space

1. Volumes + Files + uniquifier

2. WHY?

a. Efficiency: search n+m instead of n*m locations

3. Why Uniquifier?

a. Can reuse table slots – makes lots of things easy if you don’t have to check for duplicates

b. HOW IMPLEMENT? Machine ID + counter?

iii. location independent names

1. Can move volumes between servers

iv. Clients cache volume à server mapping

1. Volume location is a hint

a. Piece of information that can improve performance if correct, but has no semantically negative consequences if incorrect

Threaded single-process server

i. Thread per request, not per client

ii. Allows overlapped network I/O

iii. QUESTION: How do you set the number? What determines the number you need?

New open-by-ID file system call

i. Can open by inode number

ii. QUESTION: is it required that you have an ID for opening files?

1. e.g. Windows makes this hard – doesn’t have inode numbers

Summary: opening a file

i. Walk path recursively

1. If directory in cache with callback, go on

2. If in cache w/o callback, check

3. if not in cache, fetch + get callback

ii. Open file

1. If in cache with callback, use

2. W/o callback – very callback

3. not in cache – fetch w/ callback

Scalability results:

i. Helped a lot

ii. High level points:

1. Shift work to clients

2. Call-backs instead of polling

3. Threads instead of processes

QUESTION: What workload is this optimized for?
QUESTION: what changes would you make for large file / random access workloads?
QUESTION: What else would have to change?
QUESTION: why was the performance better than NFS?

i. NFS uses kernel threads

ii. NFS stat calls hit in cache

iii. A: much bigger client cache

iv. Performance win may not last!

QUESTION: What happens when you open / read / write a file?

i. Lookup each component directory of path name

ii. Check cache first – if have in cache, and have callback, use

1. Else ask server to update callback or fetch from server

iii. Read/write: do locally

iv. Close: copy changes up to server

v. QUESTION: what about temp files?

1. Don’t put them on AFS – use local disks

Manageability Improvements

Volumes

i. Unit of partitioning

1. Use recursive copy-on-write to move data

ii. Unit of replication

iii. Unit of backup

1. QUESTION: How do you get a consistent backup?

2. Use copy-on-write to CLONE

iv. Unit of applying quotas

v. Logically separate from FS name space and underlying disk partitions

1. Table of mount points indicates name space of volumes

vi. Volume mapping (what server has a volume) is SOFT STATE

1. Can try to use it, but if stale, will learn real value

2. PURELY OPTIMIZATION, LOW COST

AFS techniques

dedicated file servers (not peer-to-peer): specialized hardware, software
Push work to client: they have cycles to burn
Cache whenever possible: files, directories, location information
Exploit usage properties: e.g. relaxed consistency. Raise granularity of management (e.g. volumes, directories); e.g. make /tmp local; use read-only replication for /bin
Minimize system-wide knowledge and change: clients don't need full knowledge of all servers; file location at server level rarely changes

Limits:

Can’t support disk-less clients well
Can’t handle large files well – need to copy in entirety
Latency to first byte for uncached files is high

Comparison to NFS

i. Bizarre consistency semantics

ii. Higher server load – must interact with client on reads / writes

iii. Less caching on client

iv. Faster error recovery – can just reboot server

v. More network packets

vi. Lower latency – don’t have to wait to download file on open. Better for large random access files

Approach to consistency / durability

Move requirement of when files are consistent / durable from system to application

i. E.g. delayed write after 30 seconds

ii. E.g. delayed check for consistency after 30 seconds

Following Unix semantics

i. NFS gives up, tries to emulate on client

ii. AFS weakens slightly with open-close instead of read-write consistency

iii.

Naming: single global name space vs. per machine spaces

i. QUESTION: Can you emulate per-machine with single global? Yes, use symlinks

Mounting

i. How does AFS / NFS handle it?

1. AFS resolves name via DNS?