AFS and NFS
i. Lack of security?
1. How real is this?
ii. Use of RPC?
1. Read a block at a time (8kb in a UDP packet)
2. Goal: easy to port
iii. Real contribution was vnode/vfs?
iv. Locking? Could two clients create locks at the same time?
1. Have separate lock server, not use lock files
i. DBMS not work? later said not a problem – not a design goal
ii. Servers must keep state – could degrade perf? But state allows avoidance of work É
iii. Complexity of callbacks – what happens on failure?
iv. Confusion: client cache is on disk, not in memory
v. What happens if apps run on server?
1. A: Explicitly state that file servers are dedicated machines, donÕt run apps
2.
i. Large number of clients
ii. Client performance not as important
iii. Central store for shared data, not diskless workstations
i. Some model you can program against
i. Need to handle client & server failures
i. Want global name space, not per-machine name space
1. compare to NFS, CIFS
2. Gain: transparency if file moved
i. QUESTION: Why? Is it necessary? Easier to port / heterogeneous
ii. Can use RPC-level security solutions
i. QUESTION: What happens on network failures?
ii.
i. QUESTION: Why? Easy recovery from failure
ii. No information retained across RPC invocations
iii. Easy crash recovery – just restart server, client resends RPC request until server comes back
iv. Server doesnÕt need to detect client failure
v. Problem: what if client retries a non-idempotent operation
1. E.g. remove?
i. Uses UDP with large (8kb) messages
i. Clients work with File Handles – like AFS FID, based on inode number
1. Has inode number + generation num ber + FS id == handle
2. Generation used to allow inodes to be reused
3. FS ID
ii. Where do fhÕs come from?
1. Lookup
2. Create
3. Mkdir
4. MOUNT protocol – totally separate, uses unix path names to request access to local path on server
a. Early binding : at mount time
b. Host names translated to IP addresses here
5. Mount types
a. Hard: client retries forever
i. QUESTION: When use? For client apps that donÕt do error checking
b. Soft: client gives up, returns error
i. Servers can export any directory (like Windows sharing)
1. Only exports a single file system – doesnÕt cross mount points
ii. Clients mount anywhere in name space
iii. Each client can mount files in a different place
iv. QUESTION: What are benefits / drawbacks?
v. QUESTION: How handle cycles?
1. A: NFS servers wonÕt serve files across mount points
2. Clients must mount next file system below in the FS hierarchy
3.
i. Clients cache data for 30 seconds
ii. Clients can used cached data for 30 seconds without checking with server
iii. Servers must write data to disk before returning (no delayed writing)
1. QUESTION: What are performance implications?
iv. Attribute cache for file attributes – kept for 3 seconds
1. Used to see if attributes have changed
2. Discarded after 60 seconds
v. Version 3 adds session-consistency: block on close, revalidate data on open (implemented but not promised)
i. Client sends user ID, group IDs to server
ii. Requires global name space of IDs – need to service YellowPages to implement this
iii. Server must trust client OS
iv. QUESTION: What about root on client OS? ˆ map to nobody on server (same in Windows)
i. Servers cache blocks
ii. Clients cache blocks, metadata
1. Attribute cache has metadata, checked on open, discarded after 60 seconds
2. Data cache flushed after 30 seconds
iii. Clients consult with server
iv. Issues:
1. Unix files are removed until last handle closed
a. On stateless server, causes file to be deleted while still in use
b. Solution in NFS: rename file, remove on local close
2. Permissions may change on open files
a. Unix allows access if have open handles
b. NFS may deny access
i. Solution: Save client credentials at open time, use to check access later
ii. NOTE: server doesnÕt do enforcement here
i. Open: client checks with remote sever to fetch or revalidate cached inode (if older than 30 seconds)
ii. Reads handled locally, writes written back after 30 seconds
iii. Nothing happens on close
iv. Data flushed after 30 seconds – may not be seen by other clients for another 30 seconds
v. Write: delayed on client for 30 seconds, then written synchronously to server
i. Stateless server for easy crash recovery (keep system simple!)
ii. Relax consistency (no guarantees) to get better performance
iii. Pure client server; no distributed servers
i. synchronous writes for durability hurts performance
ii. return both dir + inode attributes with READDIRPLUS
iii. is stateful:
1. open / close calls needed for exclusive access compatibility with windows
2. Stronger cache consistency – calls now return a change_info structure that indicate whether directory changed during the operation – see if has to flush the directory
3. Supports leases / callbacks when single file accesses file
iv. Removes separate mount protocl
v. Removes separate lock protocol
vi. Uses tcp/ip (can traverse fire walls)
vii. Support compound RPCs – combine multiple operations together
viii. Server can export a tree with multiple file systems
ix. Can lookup full pathnames, not just components
i. Low scalability: performance got a lot worse (on clients) when # of clients goes up
ii. QUESTION: what was bottleneck?
1. Server disk? Seek time ? disk BW?
2. Server CPU?
3. Network?
4. Client CPU/Disk?
i. Used by many others
ii. QUESTION: What does it represent?
1. A: nothing.
2. Has a mix of workloads, can see how they respond
iii. Pieces:
1. Make dir – create directory tree: stresses metadata
2. Copy – copy in files – stresses file writes / creates
3. Scan Dir (like ls –R) – stresses metadata reads
4. ReadAll – find . | wc – stresses whole file reads
5. Make – may be CPU bound, does lots of reads + fewer writes
iv. QUESTION: What is missing?
1. All pieces do whole-file reads / writes
2. Missing productivity applications, scientific applications
v. QUESTION: they use a different platform for prototype and final version. is this relevant?
1. A: the prototype evaluation is to show where bottlenecks are
2. A: evaluation of final one shows what bottlenecks remain, compare against other systems
i. Local-file Latency?
ii. Local-file Throughput?
iii. Server throughput?
iv. Server latency?
i. QUESTION: Why not partial files?
ii. Usage study shows most files accessed in entirety
iii. Simplifies protocol / consistency
iv. Read / write handled completely locally
v. QUESTION: What workload is this optimized for
i. QUESTION: Why? Increases latency (local disk access)
ii. Reduces load on server by having a larger client cache
i. Get latest value on open
ii. Changes visible on close
1. Write-through to the server (minimizes server inconsistency, but increases load compared to write-back when evicted)
iii. Read/write purely local – get local unix semantics
1. programs not location-transparent
iv. Metadata is global synchronous
v. QUESTION: different from Unix. Is it a problem? When?
i. Names are same on all clients
ii. Can move volumes between servers, nothing changes
i. Server notifies client if file changes
ii. QUESTION: Why?
1. Reduces load on server- no client polling
iii. How bad is it for the server?
1. QUESTION: how much state does the server manage?
a. Call back per file cached
2. QUESTION: What can the server do to reduce this?
a. Limit # of callbacks
iv. What happens on failure?
1. After client failure, clients re-establish all callbacks
2. After server failure, É
v. Batching requests for performance
1. Can release lots of callbacks at once
i. Id-based names
1. Servers never do path lookups
a. PRINCIPLE: make clients do the work – use the CPU cycles
ii. 2-level name space
1. Volumes + Files + uniquifier
2. WHY?
a. Efficiency: search n+m instead of n*m locations
3. Why Uniquifier?
a. Can reuse table slots – makes lots of things easy if you donÕt have to check for duplicates
b. HOW IMPLEMENT? Machine ID + counter?
iii. location independent names
1. Can move volumes between servers
iv. Clients cache volume ˆ server mapping
1. Volume location is a hint
a. Piece of information that can improve performance if correct, but has no semantically negative consequences if incorrect
i. Thread per request, not per client
ii. Allows overlapped network I/O
iii. QUESTION: How do you set the number? What determines the number you need?
i. Can open by inode number
ii. QUESTION: is it required that you have an ID for opening files?
1. e.g. Windows makes this hard – doesnÕt have inode numbers
i. Walk path recursively
1. If directory in cache with callback, go on
2. If in cache w/o callback, check
3. if not in cache, fetch + get callback
ii. Open file
1. If in cache with callback, use
2. W/o callback – very callback
3. not in cache – fetch w/ callback
i. Helped a lot
ii. High level points:
1. Shift work to clients
2. Call-backs instead of polling
3. Threads instead of processes
i. NFS uses kernel threads
ii. NFS stat calls hit in cache
iii. A: much bigger client cache
iv. Performance win may not last!
i. Lookup each component directory of path name
ii. Check cache first – if have in cache, and have callback, use
1. Else ask server to update callback or fetch from server
iii. Read/write: do locally
iv. Close: copy changes up to server
v. QUESTION: what about temp files?
1. DonÕt put them on AFS – use local disks
i. Unit of partitioning
1. Use recursive copy-on-write to move data
ii. Unit of replication
iii. Unit of backup
1. QUESTION: How do you get a consistent backup?
2. Use copy-on-write to CLONE
iv. Unit of applying quotas
v. Logically separate from FS name space and underlying disk partitions
1. Table of mount points indicates name space of volumes
vi. Volume mapping (what server has a volume) is SOFT STATE
1. Can try to use it, but if stale, will learn real value
2. PURELY OPTIMIZATION, LOW COST
i. Bizarre consistency semantics
ii. Higher server load – must interact with client on reads / writes
iii. Less caching on client
iv. Faster error recovery – can just reboot server
v. More network packets
vi. Lower latency – donÕt have to wait to download file on open. Better for large random access files
i. E.g. delayed write after 30 seconds
ii. E.g. delayed check for consistency after 30 seconds
i. NFS gives up, tries to emulate on client
ii. AFS weakens slightly with open-close instead of read-write consistency
iii.
i. QUESTION: Can you emulate per-machine with single global? Yes, use symlinks
i. How does AFS / NFS handle it?
1. AFS resolves name via DNS?