CS 736 – Spring 2006

FFS

Questions

Debate: Nuri vs. Dave
Debate next time: who takes on Evan

Comments from reviews:

Change interface to FS: locks, symlinks, rename

i. Good? Bad?

Still not that good compared to internal disk bw.

i. Why? internal BW not account for seeks, fact that OS /interrupt too slow to read whole tracks at a time

CPU is bottleneck for disk?

i. CPU in use is slow – 1 mips.

Not justify numbers – e.g. why 4kb block, not 8kb or 16kb?
“throughput was the problem. The solution was to improve throughput” not acceptable for performance question. How did they improve throughput?

Reminder on project
Disk background

Geometry

i. Platters

ii. Tracks

iii. Sectors

Classes

i. SCSI: enterprise disks, higher speed (up to 15,000 rpm), smaller diameter (2.5 in), smaller capacity. Seek time ~ 4.3 ms, more platters

ii. IDE/SATA: personal disks: lower speed (5,400 - 7,200 rpm), larger diameter (3.5 in). May not even use all surfaces available (e.g. 3 out of 4). Seek time ~9ms, fewer platters

How used:

i. Density: 30 GB/sq. in.

ii. ZBR: zoned bit recording. More zones on outer tracks, read more bits per revolution

iii. Track ordering: serpentine. Tracks go in on top platter, on on bottom, in on next platter, then out. Cylinder idea not so good any more: platter-platter switching time may be higher than track-track seek.

iv. Old model: 3 dimension address of a block: surface, cylinder, sector

1. Why no cylinders?

2. A: Disk geometries change a lot: Tracks per inch, sectors per track (varies over track). Complicated to expose all this to OS. Move to linear

3. Flaws: disk can map around flaws – move a block somewhere else. Hard to do if raw geometry exposed.

New interfaces: SCSI

i. provide a queue of commands, disk can do some ordering: e.g. balance both seek and rotation cost to optimize next block

ii. can get average seek distance down to 1/10 maximum (vs. 1/3 for ata, non-queued)

Key observation on disks: mechanical factors limit performance

i. seek time (#1)

ii. rotational latency (#2)

Review of Original Unix FS

Design goal: simple, small, space efficient
512 byte blocks – match sector size, lead to good utilization on small disks for small files
Organization: inodes at one end of disk, followed by data blocks
Free blocks arranged on a free list

i. Initially ordered, but then random as file system ages

ii. Digression on FS aging: why it is important to test old file systems

Sequential blocks in file often not sequential on disk
FS transfer rate degraded from 175 kb/s to 30 kb/s after fragmentation
Issues:

i. Need to support small files efficiently

1. At the time, most files were very small

2. 4 kb block size alone wastes 45% of disk space

ii. Need to preserve locality

1. Allocate blocks so not need to minimize seek time, improve throughput

OPPORTUNITY: disk is so slow, can use some extra CPU to make better decisions, some extra space to store better indexes

FFS solutions

QUESTION: What were the solutions:
Throughput:

i. 4096 byte blocks

Fragmentation

i. 512 byte fragments within a block

Locality

i. Cylinder groups

Efficiency

i. Summaries

ii. 2 level allocators (inter- and intra- cylinder group )

Flexibility for different hardware

i. Optimize intra-cylinder allocation for disk / processor capabilities:

High level approaches:

i. Change data structures

ii. 2 level decisions (e.g. fragments / blocks, cylinder groups / sectors)

FFS ideas

Understand workload

i. QUESTION: Where does locality come from?

ii. A: directories and within files

iii. QUESTION: What about other workloads?

1. DB?

2. Google Index?

Cylinder groups

i. QUESTION: What are they for? For providing spatial locality of blocks with temporal locality

ii. Group of cylinders near each other, cheap to seek between tracks

iii. Each cylinder group has some bookkeeping information

1. Superblock = description of FS (block size)

2. Space for inodes

3. Free block bitmap

4. QUESTION: What is this technique:

a. A: Change data structure to store more information

5. Summary information on data block usage

a. # of available blocks at each rotational position (8 groups in 2 ms increments)

6. Index into block bitmap for each rotational position

iv. Static # of inodes allocated for a cylinder group

Fragments

i. Each block is broken up into fragments (512+bytes)

ii. Free block bitmap records free fragments

iii. Fragments used for:

1. Small files

2. Tails of large files

iv. Expanding a file with a fragment may require copying data

v. QUESTION: How minimize copying due to fragments as users write data?

1. A: new system call to learn size of blocks, so can write data in complete blocks

vi. QUESTION: Is this a problem? When? There are usually optimal data access values in any system, e.g. VM pages. How much can you hide this?

vii. Benefit: Provides efficiency of small blocks plus transfer rate of large blocks

Parameterization:

i. Optimize layout for disk parameters

ii. Parameters used:

1. Processor speed

2. HW support for large transfers

3. Blocks per track

4. Disk spin rate

5. Time between transfers

iii. Goal: find rotationally “optimal” blocks

1. Idea: want to read next block with minimum cost

a. Ideally, head is right before block when you want to read it

2. Depends on:

a. Transfer rate of processor

b. Time to set up next transfer

c. Speed of disk

d. Number of blocks you can read in a row

iv. Pre-allocate indexes to find a “near by” block quickly

1. Store vector of indexes into block map

2. Cylinder group stores # of free blocks at each position

3. Allocator uses vector to find blocks, then looks for ones in the right cylinder group

v. Where does this info come from?

1. Administrators – allows installing FS on one system then moving to another

2. Recent tools determine layout (from CMU)

vi. QUESTION: Why so many parameters? What do they really care about?

1. A: given a block, what is the best next block to read/write

2. What is the rotational delay between subsequent blocks that can be read?

vii. QUESTION: parameterization ties fs layout to disk, processor. Is this a problem?

1. A: what happens if move to other disk? To other CPU? Who cares?

Layout policies

i. QUESTION: What is ideal policy

1. Everything near everything else

ii. QUESTION: how do you balance wanting locality and avoiding hot spots?

iii. Policy levels:

1. Global policies:

a. Use system-wide summary information to place new inodes and data blocks

i. Where do directories and files go

b. Calculate rotationally optimized block layouts

c. Decide when to seek because insufficient blocks in a cylinder

d. Request ideal block

2. Local policies:

a. Assign individual blocks within a file

b. if ideal not available, finds next best block with more accurate information.

iv. Goals:

1. Minimize seek latency for related accesses

2. Miminize overhead of large transfers

v. Globally

1. Cluster related information

2. Can’t cluster too much. QUESTION: Why?

a. Leads to hot spots; fill up cylinder groups and lead to sub-optimal allocations

b. Must spread load of unrelated files

3. Where does locality come from?

a. Files within a directory (for ls-l)

i. Place all inodes in the same cylinder group

ii. Choose group with most free inodes, fewest directories (worst fit)

iii. Within a cylinder, inodes allocated “next fit” – randomly, but can read all inodes in 8-16 transfers

b. Blocks within a file

i. Try to put all blocks in same cylinder group as directory inode at rotationally optimal places

ii. To spread load, redirect allocation after file grows: initially at 48 kb and then every 1 mb to spread load

iii. Global policy requests specific blocks

iv. If block not available:

1. Next closest block on same cylinder

2. Same cylinder group

3. Quadratic hash

4. Check all cylinder groups

v. Reasoning: want a close block. If not, want to find one quickly. If not that, then disk is nearly full, need to look closely

QUESTION: What would you do if you wanted to speed up write performance when disk is full?

i. A: Maintain list of free block sorted by cylinder group

Reliability

i. NOTE: reliability a big problem for disks. What can fail?

1. A surface

2. A track

3. A sector

ii. Information replicated/distributed across disk

1. Superblocks on each cylinder group

2. Superblocks spirals down

a. Any track, cylinder or platter can be lost

Enhancements:

i. File locking:

1. Locks only on open files (in-core structures)

2. QUESTION: Why?

3. Only advisory; only apps that ask for locks will see them

4. QUESTION: Why?

a. Admin/system must be able to break locks.

b. QUESTION: Why not have a break-lock API?

ii. Symbolic links

1. Indirection via FS names

2. QUESTION: Benefits?

a. Links across volumes

3. QUESTION: drawbacks?

a. May break

iii. Atomic rename

1. QUESTION: Why need?

2. QUESTION: what does it do?

a. Delete old file, rename new to old

Summary:

Global policies with small amount of information: free space per cylinder group, # of blocks available at different positions
Local policies to place exact block
Redundancy across different dimensions (track, sector, cylinder)
Sub-allocation for efficiency
Move disk head scheduling from disk controller to FS by changing allocation policies

Evaluation

Compare to Unix
Explain anomalies: e.g. why is read/write different for Unix

i. Answer: queuing of write traffic instead of synchronous

CPU utilization higher

i. More time spent finding optimal blocks

ii. Disk I/O saturates CPU for copying data to user programs

Queuing effect less for FFS: QUESTION: Why?