FFS

  1. Questions
    1. Debate: Nuri vs. Dave
    2. Debate next time: who takes on Evan
  2. Comments from reviews:
    1. Change interface to FS: locks, symlinks, rename

                                              i.     Good? Bad?

    1. Still not that good compared to internal disk bw.

                                              i.     Why? internal BW not account for seeks, fact that OS /interrupt too slow to read whole tracks at a time

    1. CPU is bottleneck for disk?

                                              i.     CPU in use is slow – 1 mips.

    1. Not justify numbers – e.g. why 4kb block, not 8kb or 16kb?
    2. Òthroughput was the problem. The solution was to improve throughputÓ not acceptable for performance question. How did they improve throughput?
  1. Reminder on project
  2. Disk background
    1. Geometry

                                              i.     Platters

                                             ii.     Tracks

                                           iii.     Sectors

    1. Classes

                                               i.     SCSI: enterprise disks, higher speed (up to 15,000 rpm), smaller diameter (2.5 in), smaller capacity. Seek time ~ 4.3 ms, more platters

                                             ii.     IDE/SATA: personal disks: lower speed (5,400  - 7,200 rpm), larger diameter (3.5 in). May not even use all surfaces available (e.g. 3 out of 4). Seek time ~9ms, fewer platters

    1. How used:

                                               i.     Density: 30 GB/sq. in.

                                             ii.     ZBR: zoned bit recording. More zones on outer tracks, read more bits per revolution

                                            iii.     Track ordering: serpentine. Tracks go in on top platter, on on bottom, in on next platter, then out. Cylinder idea not so good any more: platter-platter switching time may be higher than track-track seek.

                                            iv.     Old model: 3 dimension address of a block: surface, cylinder, sector

1.     Why no cylinders?

2.     A:  Disk geometries change a lot: Tracks per inch, sectors per track (varies over track). Complicated to expose all this to OS. Move to linear

3.     Flaws: disk can map around flaws – move a block somewhere else. Hard to do if raw geometry exposed.

    1. New interfaces: SCSI

                                               i.     provide a queue of commands, disk can do some ordering: e.g. balance both seek and rotation cost to optimize next block

                                             ii.     can get average seek distance down to 1/10 maximum (vs. 1/3 for ata, non-queued)

    1. Key observation on disks: mechanical factors limit performance

                                               i.     seek time (#1)

                                             ii.     rotational latency (#2)

 

  1. Review of Original Unix FS
    1. Design goal: simple, small, space efficient
    2. 512 byte blocks – match sector size, lead to good utilization on small disks for small files
    3. Organization: inodes at one end of disk, followed by data blocks
    4. Free blocks arranged on a free list

                                              i.     Initially ordered, but then random as file system ages

                                             ii.     Digression on FS aging: why it is important to test old file systems

    1. Sequential blocks in file often not sequential on disk
    2. FS transfer rate degraded from 175 kb/s to 30 kb/s after fragmentation
    3. Issues:

                                              i.     Need to support small files efficiently

1.   At the time, most files were very small

2.   4 kb block size alone wastes 45% of disk space

                                             ii.     Need to preserve locality

1.   Allocate blocks so not need to minimize seek time, improve throughput

    1. OPPORTUNITY: disk is so slow, can use some extra CPU to make better decisions, some extra space to store better indexes
  1. FFS solutions
    1. QUESTION: What were the solutions:
    2. Throughput:

                                              i.     4096 byte blocks

    1. Fragmentation

                                              i.     512 byte fragments within a block

    1. Locality

                                              i.     Cylinder groups

    1. Efficiency

                                              i.     Summaries

                                             ii.     2 level allocators (inter- and intra- cylinder group )

    1. Flexibility for different hardware

                                              i.     Optimize intra-cylinder allocation for disk / processor capabilities:

    1. High level approaches:

                                              i.     Change data structures

                                             ii.     2 level decisions (e.g. fragments / blocks, cylinder groups / sectors)

  1. FFS ideas
    1. Understand workload

                                              i.     QUESTION: Where does locality come from?

                                             ii.     A: directories and within files

                                           iii.     QUESTION: What about other workloads?

1.   DB?

2.   Google Index?

    1. Cylinder groups

                                              i.     QUESTION: What are they for? For providing spatial locality of blocks with temporal locality

                                             ii.     Group of cylinders near each other, cheap to seek between tracks

                                           iii.     Each cylinder group has some bookkeeping information

1.   Superblock = description of FS (block size)

2.   Space for inodes

3.    Free block bitmap

4.   QUESTION: What is this technique:

a.    A: Change data structure to store more information

5.   Summary information on data block usage

a.    # of available blocks at each rotational position (8 groups in 2 ms increments)

6.   Index into block bitmap for each rotational position

                                           iv.     Static # of inodes allocated for a cylinder group

    1. Fragments

                                              i.     Each block is broken up into fragments (512+bytes)

                                             ii.     Free block bitmap records free fragments

                                           iii.     Fragments used for:

1.   Small files

2.   Tails of large files

                                           iv.     Expanding a file with a fragment may require copying data

                                             v.     QUESTION: How minimize copying due to fragments as users write data?

1.   A: new system call to learn size of blocks, so can write data in complete blocks

                                           vi.     QUESTION: Is this a problem? When? There are usually optimal data access values in any system, e.g. VM pages. How much can you hide this?

                                          vii.     Benefit: Provides efficiency of small blocks plus transfer rate of large blocks

    1. Parameterization:

                                              i.     Optimize layout for disk parameters

                                             ii.     Parameters used:

1.   Processor speed

2.   HW support for large transfers

3.   Blocks per track

4.   Disk spin rate

5.   Time between transfers

                                           iii.     Goal: find rotationally ÒoptimalÓ blocks

1.   Idea: want to read next block with minimum cost

a.    Ideally, head is right before block when you want to read it

2.   Depends on:

a.    Transfer rate of processor

b.   Time to set up next transfer

c.    Speed of disk

d.   Number of blocks you can read in a row

                                           iv.     Pre-allocate indexes to find a Ònear byÓ block quickly

1.   Store vector of indexes into block map

2.   Cylinder group stores # of free blocks at each position

3.   Allocator uses vector to find blocks, then looks for ones in the right cylinder group

                                             v.     Where does this info come from?

1.   Administrators – allows installing FS on one system then moving to another

2.   Recent tools determine layout (from CMU)

                                           vi.     QUESTION: Why so many parameters? What do they really care about?

1.   A: given a block, what is the best next block to read/write

2.   What is the rotational delay between subsequent blocks that can be read?

                                          vii.     QUESTION: parameterization ties fs layout to disk, processor. Is this a problem?

1.   A: what happens if move to other disk? To other CPU? Who cares?

    1. Layout policies

                                              i.     QUESTION: What is ideal policy

1.   Everything near everything else

                                             ii.     QUESTION: how do you balance wanting locality and avoiding hot spots?

                                           iii.     Policy levels:

1.   Global policies:

a.    Use system-wide summary information to place new inodes and data blocks

                                                                                                    i.     Where do directories and files go

b.   Calculate rotationally optimized block layouts

c.    Decide when to seek because insufficient blocks in a cylinder

d.   Request ideal block

2.   Local policies:

a.    Assign individual blocks within a file

b.   if ideal not available, finds next best block with more accurate information.

                                           iv.     Goals:

1.   Minimize seek latency for related accesses

2.   Miminize overhead of large transfers

                                             v.     Globally

1.   Cluster related information

2.   CanÕt cluster too much. QUESTION: Why?

a.    Leads to hot spots; fill up cylinder groups and lead to sub-optimal allocations

b.   Must spread load of unrelated files

3.   Where does locality come from?

a.    Files within a directory (for ls-l)

                                                                                                    i.     Place all inodes in the same cylinder group

                                                                                                   ii.     Choose group with most free inodes, fewest directories (worst fit)

                                                                                                 iii.     Within a cylinder, inodes allocated Ònext fitÓ – randomly, but can read all inodes in 8-16 transfers

b.   Blocks within a file

                                                                                                    i.     Try to put all blocks in same cylinder group as directory inode at rotationally optimal places

                                                                                                   ii.     To spread load, redirect allocation after file grows: initially at 48 kb and then every 1 mb to spread load

                                                                                                 iii.     Global policy requests specific blocks

                                                                                                 iv.     If block not available:

1.   Next closest block on same cylinder

2.   Same cylinder group

3.   Quadratic hash

4.   Check all cylinder groups

                                                                                                   v.     Reasoning: want a close block. If not, want to find one quickly. If not that, then disk is nearly full, need to look closely

    1. QUESTION: What would you do if you wanted to speed up write performance when disk is full?

                                              i.     A: Maintain list of free block sorted by cylinder group

    1. Reliability

                                              i.     NOTE: reliability a big problem for disks. What can fail?

1.   A surface

2.   A track

3.   A sector

                                             ii.     Information replicated/distributed across disk

1.   Superblocks on each cylinder group

2.   Superblocks spirals down

a.    Any track, cylinder or platter can be lost

    1. Enhancements:

                                              i.     File locking:

1.   Locks only on open files (in-core structures)

2.   QUESTION: Why?

3.   Only advisory; only apps that ask for locks will see them

4.   QUESTION: Why?

a.    Admin/system must be able to break locks.

b.   QUESTION: Why not have a break-lock API?

                                             ii.     Symbolic links

1.   Indirection via FS names

2.   QUESTION: Benefits?

a.    Links across volumes

3.   QUESTION: drawbacks?

a.    May break

                                           iii.     Atomic rename

1.   QUESTION: Why need?

2.   QUESTION: what does it do?

a.    Delete old file, rename new to old

  1. Summary:
    1. Global policies with small amount of information: free space per cylinder group, # of blocks available at different positions
    2. Local policies to place exact block
    3. Redundancy across different dimensions (track, sector, cylinder)
    4. Sub-allocation for efficiency
    5. Move disk head scheduling from disk controller to FS by changing allocation policies
  2. Evaluation
    1. Compare to Unix
    2. Explain anomalies: e.g. why is read/write different for Unix

                                              i.     Answer: queuing of write traffic instead of synchronous

    1. CPU utilization higher

                                              i.     More time spent finding optimal blocks

                                             ii.     Disk I/O saturates CPU for copying data to user programs

    1. Queuing effect less for FFS: QUESTION: Why?
    2.