AFS and NFS

 

  1. Questions from reviews:
    1. NFS:

                                              i.     Lack of security?

1.   How real is this?

                                             ii.     Use of RPC?

1.   Read a block at a time (8kb in a UDP packet)

2.   Goal: easy to port

                                           iii.     Real contribution was vnode/vfs?

                                           iv.     Locking? Could two clients create locks at the same time?

1.   Have separate lock server, not use lock files

    1. AFS

                                              i.     DBMS not work? later said not a problem – not a design goal

                                             ii.     Servers must keep state – could degrade perf? But state allows avoidance of work É

                                           iii.     Complexity of callbacks – what happens on failure?

                                           iv.     Confusion: client cache is on disk, not in memory

                                             v.     What happens if apps run on server?

1.   A: Explicitly state that file servers are dedicated machines, donÕt run apps

2.    

  1. Unix semantics
    1. Any change to a file or file system visible to next operation (e.g. read returns data just written)
    2. The last-close semantic is the semantic which requires that an open file remain available to any process which has the file open regardless of any changes in file or process characteristics which may take place after the file is opened. It is called the last-close semantic because the best known consequence of the last-close semantic is that when a file is deleted, the file is not removed until the file is closed by the last process which has it open.
  2. Goals
    1. Network file system
    2. Scale – how big?

                                              i.     Large number of clients

                                             ii.     Client performance not as important

                                           iii.     Central store for shared data, not diskless workstations

    1. Consistency

                                              i.     Some model you can program against

    1. Reliability

                                              i.     Need to handle client & server failures

    1. Naming

                                              i.     Want global name space, not per-machine name space

1.   compare to NFS, CIFS

2.   Gain: transparency if file moved

  1. NFS
    1. Implemented on RPC / XDR for data format conversion

                                              i.     QUESTION: Why? Is it necessary? Easier to port / heterogeneous

                                             ii.     Can use RPC-level security solutions

    1. Machine & OS independent
    2. Crash recovery
    3. Transparent to clients

                                              i.     QUESTION: What happens on network failures?

                                             ii.      

    1. Designed for disk-less clients
    2. Stateless protocol

                                              i.     QUESTION: Why? Easy recovery from failure

                                             ii.     No information retained across RPC invocations

                                           iii.     Easy crash recovery – just restart server, client resends RPC request until server comes back

                                           iv.     Server doesnÕt need to detect client failure

                                             v.     Problem: what if client retries a non-idempotent operation

1.   E.g. remove?

    1. Datagram protocol

                                              i.     Uses UDP with large (8kb) messages

    1. Servers donÕt do name operations

                                              i.     Clients work with File Handles – like AFS FID, based on inode number

1.   Has inode number + generation num ber + FS id == handle

2.   Generation used to allow inodes to be reused

3.   FS ID

                                             ii.     Where do fhÕs come from?

1.   Lookup

2.   Create

3.   Mkdir

4.   MOUNT protocol – totally separate, uses unix path names to request access to local path on server

a.    Early binding : at mount time

b.   Host names translated to IP addresses here

5.   Mount types

a.    Hard: client retries forever

                                                                                                    i.     QUESTION: When use? For client apps that donÕt do error checking

b.   Soft: client gives up, returns error

    1. Naming

                                              i.     Servers can export any directory (like Windows sharing)

1.   Only exports a single file system – doesnÕt cross mount points

                                             ii.     Clients mount anywhere in name space

                                           iii.     Each client can mount files in a different place

                                           iv.     QUESTION: What are benefits / drawbacks?

                                             v.     QUESTION: How handle cycles?

1.   A: NFS servers wonÕt serve files across mount points

2.   Clients must mount next file system below in the FS hierarchy

3.    

    1. NFS file semantics

                                              i.     Clients cache data for 30 seconds

                                             ii.     Clients can used cached data for 30 seconds without checking with server

                                           iii.     Servers must write data to disk before returning (no delayed writing)

1.   QUESTION: What are performance implications?

                                           iv.     Attribute cache for file attributes – kept for 3 seconds

1.   Used to see if attributes have changed

2.   Discarded after 60 seconds

                                             v.     Version 3 adds session-consistency: block on close, revalidate data on open (implemented but not promised)

    1. Security

                                              i.     Client sends user ID, group IDs to server

                                             ii.     Requires global name space of IDs – need to service YellowPages to implement this

                                           iii.     Server must trust client OS

                                           iv.     QUESTION: What about root on client OS? ˆ map to nobody on server (same in Windows)

    1. Caching

                                              i.     Servers cache blocks

                                             ii.     Clients cache blocks, metadata

1.   Attribute cache has metadata, checked on open, discarded after 60 seconds

2.   Data cache flushed after 30 seconds

                                           iii.     Clients consult with server

                                           iv.     Issues:

1.   Unix files are removed until last handle closed

a.    On stateless server, causes file to be deleted while still in use

b.   Solution in NFS: rename file, remove on local close

2.   Permissions may change on open files

a.    Unix allows access if have open handles

b.   NFS may deny access

                                                                                                    i.     Solution: Save client credentials at open time, use to check access later

                                                                                                   ii.     NOTE: server doesnÕt do enforcement here

    1. QUESTION: What happens when you open / read / write a file?

                                              i.     Open: client checks with remote sever to fetch or revalidate cached inode (if older than 30 seconds)

                                             ii.     Reads handled locally, writes written back after 30 seconds

                                           iii.     Nothing happens on close

                                           iv.     Data flushed after 30 seconds – may not be seen by other clients for another 30 seconds

                                             v.     Write: delayed on client for 30 seconds, then written synchronously to server

    1. Design essence

                                              i.     Stateless server for easy crash recovery (keep system simple!)

                                             ii.     Relax consistency (no guarantees) to get better performance

                                           iii.     Pure client server; no distributed servers

    1. Notes: NFS v4

                                              i.     synchronous writes for durability hurts performance

                                             ii.     return both dir + inode attributes with READDIRPLUS

                                           iii.     is stateful:

1.   open / close calls needed for exclusive access compatibility with windows

2.   Stronger cache consistency – calls now return a change_info structure that indicate whether directory changed during the operation – see if has to flush the directory

3.   Supports leases / callbacks when single file accesses file

                                           iv.     Removes separate mount protocl

                                             v.     Removes separate lock protocol

                                           vi.     Uses tcp/ip (can traverse fire walls)

                                          vii.     Support compound RPCs – combine multiple operations together

                                        viii.     Server can export a tree with multiple file systems

                                           ix.     Can lookup full pathnames, not just components

  1. AFS version 1:
    1. Process per client
    2. Name lookups on server
    3. Cache validation with callback on access
    4. Result:

                                              i.     Low scalability: performance got a lot worse (on clients) when # of clients goes up

                                             ii.     QUESTION: what was bottleneck?

1.   Server disk? Seek time ? disk BW?

2.   Server CPU?

3.   Network?

4.   Client CPU/Disk?

    1. Evaluation performance: Andrew Benchmark

                                              i.     Used by many others

                                             ii.     QUESTION: What does it represent?

1.   A: nothing.

2.   Has a mix of workloads, can see how they respond

                                           iii.     Pieces:

1.   Make dir – create directory tree: stresses metadata

2.   Copy – copy in files – stresses file writes / creates

3.   Scan Dir (like ls –R) – stresses metadata reads

4.   ReadAll – find . | wc – stresses whole file reads

5.   Make – may be CPU bound, does lots of reads + fewer writes

                                           iv.     QUESTION: What is missing?

1.   All pieces do whole-file reads / writes

2.   Missing productivity applications, scientific applications

                                             v.     QUESTION: they use a different platform for prototype and final version. is this relevant?

1.   A: the prototype evaluation is to show where bottlenecks are

2.   A: evaluation of final one shows what bottlenecks remain, compare against other systems

  1. AFS v2
    1. CONTEXT: designed for systems with local disks
    2. QUESTION: What is the goal?

                                              i.     Local-file Latency?

                                             ii.     Local-file Throughput?

                                           iii.     Server throughput?

                                           iv.     Server latency?

    1. Transparent to clients – match Unix naming / Whole file caching

                                              i.     QUESTION: Why not partial files?

                                             ii.     Usage study shows most files accessed in entirety

                                           iii.     Simplifies protocol / consistency

                                           iv.     Read / write handled completely locally

                                             v.     QUESTION: What workload is this optimized for

    1. Local disk cache

                                              i.     QUESTION: Why? Increases latency (local disk access)

                                             ii.     Reduces load on server by having a larger client cache

    1. Relaxed but well-defined consistency semantics

                                              i.     Get latest value on open

                                             ii.     Changes visible on close

1.   Write-through to the server (minimizes server inconsistency, but increases load compared to write-back when evicted)

                                           iii.     Read/write purely local – get local unix semantics

1.   programs not location-transparent

                                           iv.     Metadata is global synchronous

                                             v.     QUESTION: different from Unix. Is it a problem? When?

    1. Global name space: /afs

                                              i.     Names are same on all clients

                                             ii.     Can move volumes between servers, nothing changes

    1.  
  1. Performance improvements
    1. Call backs

                                              i.     Server notifies client if file changes

                                             ii.     QUESTION: Why?

1.   Reduces load on server- no client polling

                                           iii.     How bad is it for the server?

1.   QUESTION: how much state does the server manage?

a.    Call back per file cached

2.   QUESTION: What can the server do to reduce this?

a.    Limit # of callbacks

                                           iv.     What happens on failure?

1.   After client failure, clients re-establish all callbacks

2.   After server failure, É

                                             v.     Batching requests for performance

1.   Can release lots of callbacks at once

    1. Name resolution

                                              i.     Id-based names

1.   Servers never do path lookups

a.    PRINCIPLE: make clients do the work – use the CPU cycles

                                             ii.     2-level name space

1.   Volumes + Files + uniquifier

2.   WHY?

a.    Efficiency: search n+m instead of n*m locations

3.   Why Uniquifier?

a.    Can reuse table slots – makes lots of things easy if you donÕt have to check for duplicates

b.   HOW IMPLEMENT? Machine ID + counter?

                                           iii.     location independent names

1.   Can move volumes between servers

                                           iv.     Clients cache volume ˆ server mapping

1.   Volume location is a hint

a.    Piece of information that can improve performance if correct, but has no semantically negative consequences if incorrect

 

    1. Threaded single-process server

                                              i.     Thread per request, not per client

                                             ii.     Allows overlapped network I/O

                                           iii.     QUESTION: How do you set the number? What determines the number you need?

    1. New open-by-ID file system call

                                              i.     Can open by inode number

                                             ii.     QUESTION: is it required that you have an ID for opening files?

1.   e.g. Windows makes this hard – doesnÕt have inode numbers

    1. Summary: opening a file

                                              i.     Walk path recursively

1.   If directory in cache with callback, go on

2.   If in cache w/o callback, check

3.   if not in cache, fetch + get callback

                                             ii.     Open file

1.   If in cache with callback, use

2.   W/o callback – very callback

3.   not in cache – fetch w/ callback

    1. Scalability results:

                                              i.     Helped a lot

                                             ii.     High level points:

1.   Shift work to clients

2.   Call-backs instead of polling

3.   Threads instead of processes

    1. QUESTION: What workload is this optimized for?
    2. QUESTION: what changes would you make for large file / random access workloads?
    3. QUESTION: What else would have to change?
    4. QUESTION: why was the performance better than NFS?

                                              i.     NFS uses kernel threads

                                             ii.     NFS stat calls hit in cache

                                           iii.     A: much bigger client cache

                                           iv.     Performance win may not last!

    1. QUESTION: What happens when you open / read / write a file?

                                              i.     Lookup each component directory of path name

                                             ii.     Check cache first – if have in cache, and have callback, use

1.   Else ask server to update callback or fetch from server

                                           iii.     Read/write: do locally

                                           iv.     Close: copy changes up to server

                                             v.     QUESTION: what about temp files?

1.   DonÕt put them on AFS – use local disks

  1. Manageability Improvements
    1. Volumes

                                              i.     Unit of partitioning

1.   Use recursive copy-on-write to move data

                                             ii.     Unit of replication

                                           iii.     Unit of backup

1.   QUESTION: How do you get a consistent backup?

2.   Use copy-on-write to CLONE

                                           iv.     Unit of applying quotas

                                             v.     Logically separate from FS name space and underlying disk partitions

1.   Table of mount points indicates name space of volumes

                                           vi.     Volume mapping (what server has a volume) is SOFT STATE

1.   Can try to use it, but if stale, will learn real value

2.   PURELY OPTIMIZATION, LOW COST

  1. AFS techniques
    1. dedicated file servers (not peer-to-peer): specialized hardware, software
    2. Push work to client: they have cycles to burn
    3. Cache whenever possible: files, directories, location information
    4. Exploit usage properties: e.g. relaxed consistency. Raise granularity of management (e.g. volumes, directories); e.g. make /tmp local; use read-only replication for /bin
    5. Minimize system-wide knowledge and change: clients don't need full knowledge of all servers; file location at server level rarely changes
  2. Limits:
    1. CanÕt support disk-less clients well
    2. CanÕt handle large files well – need to copy in entirety
    3. Latency to first byte for uncached files is high
  1. Comparison to NFS

                                              i.     Bizarre consistency semantics

                                             ii.     Higher server load – must interact with client on reads / writes

                                           iii.     Less caching on client

                                           iv.     Faster error recovery – can just reboot server

                                             v.     More network packets

                                           vi.     Lower latency – donÕt have to wait to download file on open. Better for large random access files

  1. Approach to consistency / durability
    1. Move requirement of when files are consistent / durable from system to application

                                              i.     E.g. delayed write after 30 seconds

                                             ii.     E.g. delayed check for consistency after 30 seconds

    1. Following Unix semantics

                                              i.     NFS gives up, tries to emulate on client

                                             ii.     AFS weakens slightly with open-close instead of read-write consistency

                                           iii.      

    1. Naming: single global name space vs. per machine spaces

                                              i.     QUESTION: Can you emulate per-machine with single global? Yes, use symlinks

    1. Mounting

                                              i.     How does AFS / NFS handle it?

1.   AFS resolves name via DNS?

  1.