Project 3: Virtual Memory Management

This project entails writing a simulator for experimenting with page replacement algorithms. This project will be done in groups of 3. The project is due on 11/18 at 5 pm .

Notes

11/14
Grading policy and writeup requirements clarified.
11/13
Due date is now Sunday, November 18th at 5 pm for both code and writeup.
11/1
I have posted a schedule of meetings for next week as well

10/31
I am dropping the requirement for LRU (least recently used) page replacement from project 3. Without LRU, you do not need to store a virtual address --> physical address mapping, which means that your page table can contain just a valid bit and a referenced bit thereby making it small.

A few clarifications:

There is a file .test.c. in the directory with 537-parse.c and 537-parse.h This is sample code that shows how to use the uncommented routines . you don.t need to incorporate the code in your project in any way.
The memory traces are a list of virtual page numbers. This means that the page size does not matter for this project. In addition, you can assume that there is no sharing: the virtual page are private to a process.

Goals

The goals for this project are to:

Get experience with page replacement algorithms
Get experience with writing simulation
Get experience with software engineering techniques

Project Overview

Your primary goal for this project is to implement and experiment with page replacement algorithms. To do this, you will write a paging simulator. It will read in a set of data files specifying the scheduling and memory access behavior of programs, and will simulate the paging requirements of those programs.

As input, you will receive two types of files: scheduling traces and memory traces. Support code for parsing these files is available in ~cs537-2/public/Projects/P3 .

Scheduling Traces

Scheduling files contain a trace of process start time, CPU time, and I/O count. For the purposes of this project, you will only need to look at start time and CPU time. The format of these files is illustrated below:

xlogout 0 8.58 102720 login.kr 3 1.36 137024 rm 4 0.03 96

The format is:

Program Start time CPU time I/O Count name (in seconds) (in seconds)

This trace indicates that xlogout started at time 0, required 8.58 CPU seconds to execute, and issued 102,720 I/O operations. The login.kr program started at time 3, required 1.36 CPU seconds to execute, and issued 137,024 I/O operations.

The trace is in order, meaning that each process has a start time later than or equal to the previous start time.

Code for parsing this file will be provided with the functions open_sched_trace(), close_sched_trace() and get_next_process().

The traces are available in the ~cs537-2/public/Projects/P3 directory and are named "p3sched.small", "p3sched.medium", and "p3sched.large".

Memory Traces

Memory trace files contain a list of addresess accessed by a program. The addresses are trunacted to vitual page numbers by removing the lower 12 bits, which makes the files smaller. Here is an example for PowerPoint program:

12f 77f46 12f 7ffde 12f 77f3c 12f 7ffde 12f

As an aside, the even numbered addresses (e.g. index 0,2,4) are instruction addresses and the odd numbered addresse (e.g. at index 1,3,5) are data addresses. This accounts for the alternating low values (instructions) and high values (data).

Code for parsing this file will be provided with the functions open_mem_trace(), close_mem_trace(), and get_next_vpn().

The memory traces are available in the ~cs537-2/public/Projects/P3 and are named "programname.trace". I recommend you leave the files there rather than copying them to your directory, as they are large (5MB each).

Simulating scheduling

You will need to simulate a simple scheduling algorithm that allows programs to alternate running. For this project, you only need implement round robin between runnable processes with fixed time slices. Your simulator should take as input the number of memory references in a time slice, which is called the quantum.

Your scheduler will keep track of the current time in terms of cycles, where a cycle is one memory reference in the memory trace. You will simulate a fairly slow system, where there are 100,000 cycles in a second.

Your scheduler should run (1) when the curent quantum expires, to decide whether to context switch the process, and (2) when a process blocks on a page fault (this will be explained below). Context switching should take 50 cycles. After a context switch, you should start simulating memory references from where you left off.

The scheduler should pick the next process to run and then start simulating the memory accessses for that process (see next section). Like a real scheduler, your simulated scheduler must keep track of which process is running, which are ready, and which are blocked.

Your simulator should maintain the current time (e.g., the number of seconds or cycles since the simulation started). A process from the trace should be put in the ready queue (or run) when the current time equals the start time in the scheduling trace file. It executes memory instructions for the cpu time value from the trace file, and then exits when it has used up its cpu time.

The scheduler should report elapsed time (time between when it started and when it completed) for each process. In addition, at the end of the simulation (when all processes have completed), it should report the total number of cycles executed and the number of cycles where no process was executing (i.e., there was nothing runnable).

You should experiment with quanta values of 10,000 cycles, 50,000 cycles, and 200,000 cycles.

Simulating page replacement

Your simulated machine will have a fixed number of physical pages that you must share between the currently running processes. Hence, when the memory required by the running programs exceeds the available physical memory, you will need to store some of the memory on a disk.

On each cycle, you will simulate the next memory reference in the trace for the running program. When a program starts, you should open the corresponding trace file based on the program name. For example, if the program name is "grep", you should open the "grep" memory trace.

You will simulate this by keeping track of which pages are in physical memory and which are on disk. It takes 1000 cycles to move a page from disk to memory. Moving pages from memory to disk is free, as this can be overlapped with computation. Only one page at a time can be read off disk. For example, if two processes both fault on different pages, you need to wait 1000 cycles for the first page and then 1000 cycles for the second page. You may want to maintain a queue of pending requests for the disk. While a page is moving from disk to memory, you can execute other processes -- the operation is asynchronous.

For each cycle that process executes, you should check whether the virtual page it accesses is in physical memory. If so, advance to the next address. If it is not in physical memory, you must simulate a page fault by:

Context switching to a new process
Putting the current process on a blocked queue
Reading the page off disk
Making the process ready again when the page comes back off disk

You must keep track, for each process, which pages are in physical memory and which are on disk. Assume that all pages start off on disk and you are doing demand paging. Because writing pages to disk is free, you don't have to worry about whether a page is dirty or not (and in fact, the trace does not indicate whether a memory reference is a read or a write).

For each process, you should report the number of page faults it experienced. In addition, when the simulation terminates, you should report the total number of page faults experienced across all processes.

You should experiment with total physical memory sizes of 50 pages, 250 pages, and 1000 pages.

Page replacement policies

You should simulate four different page replacement policies, specified as a command line parameter:

FIFO: The first page brought in is the first to be removed.
LRU: The least recently accessed page is the one to be removed
Second-change FIFO: Physical memory is partitioned so that only 75% of pages are available to a process. The remaining 25% are maintained as a second-chance queue. Page start at the end of the large queue. When a new page is brought in, the head of this queue is removed and it is added to the tail of the second queue. Access to pages in this queue cause the page to be placed back at the end of the first queue. The page at the head of the second queue is replaced.
Clock: You should maintain a clock hand that rotates around memory. Each physical page is marked "unused". Each time a page is needed, the hand advances until it finds a page marked "unused", at which time that page is chosen for eviction. If it finds a page marked "used" in this process, the page is changed to be "unused". In addition, on every memory reference, the page accesed should be marked "used".

For the provided scheduling traces, you should try all four page replacement policies to see how each one performs.

Write-up

You should write a 3-5 page document describing your system and explaining your results.

You should explain the architecture of your simulator: what are the components, what are the interfaces between components, and why you chose this deisgn. In addition, you should document policy decisions you made, such as:

Do new processes go at the head of the ready list or the end?
When a process starts up after a page fault, is it guaranteed to have the faulting page in memory, or could it fault again?
Do new processes or processes returning from a page fault get priority?

This is not a complete list -- you should include any policy decisions you made.

You should present three sets of experiments. For each one, you should present your results graphically (e.g., with a chart):

Page replacement algorithm: Pick a reasonable amount of memory and time quantum, and compare the performance of the page replacement algorithms in terms of total page faults experienced and the total run time of programs (i.e., the average elapsed time of all programs).
Time quantum: With either clock or second-chance FIFO, show the difference that the time quantum has on page faults and run time. You should present the results of your simulations with the three quantum values (10,000, 50,000 and 200,000). You should use a reasonable amount of memory (i.e., not 50 pages but not so much that there are no page faults).
Memory size: With either clock or second-chance FIFO, demonstrate the difference that memory size has on page replacement. You should experiment with three memory sizes: 50, 500, and 5000 pages. Use a reasonable value for time quantum (e.g., 50,000 or 100,000 cycles).

For each experiment, explain the differences in performance between the different configurations.

Methodology

This project will be done in groups of three.

For the last two projects, you were free to organize the work however you liked. For this project, you must decide an architecture and division of labor in advance.

Week 1

In the first week of the project, you should design your simulator, figuring out what the major modules are and what they do. You should write a specification of the interface to each module. For example, you may have a scheduling module that decides which process is running, a page table module that tracks which pages are in memory, and a memory simulator modue that simulates each memory reference. You need to write the interface, in terms of what functions they provide and what structures they use, to each of the interfaces so that other members of your team can program against the interface before it has been implemented.

Simulator speed is important: if you use inefficient algorithms or data strctures, it may take days for your simulations to run. For this reason, you should think about how you will efficiently maintain implement each module. Consider the complexity (linear, quadratic, exponential) of each piece of your simulation as part of the design.

In addition to specifing the interface, you should create a division of labor that includes (at least) these roles:

Programmer for each module: the person who writes the code.
Tester for each module: the person who writes test code for the module.
Experimenter: the person who runs the experiments for the paper

After the first week, each team will meet with an instructor to discuss their design.

Week 2

After the second week, you should have a working scheduling simulator that simulates just the scheduling traces, ignoring memory simulations. You should also have an working memory trace simulator, that keeps track of what is in memory and can report the number of page faults experience by a memory trace. You do not need to implement any specific replacement policy here.

Project Specification

The command line for your simulator should be:

mem-sim pages quantum-pages pr-policy trace-file

where

pages is the number of physical pages
quantum is the number of cycles in a scheduling quantum
pr-policy is one of "fifo", "lru", "2fifo", or "clock" and indicates which page replacement policy to use.
trace-file is the name of the scheduling trace file

The output of the simulator should be:

For each process, the elapsed time it took to run and the number of pages faults it experienced.
For the entire simualtion, the total elapsed time, the total number of page faults, and the total idle time.

The simulator should handle these events:

Memory reference
Process creation
Timer interrupt for time slices
Disk I/O with disk interrupts
Page faults

The costs of events for the simulation are:

Memory reference: 1 cycles
Context switch: 50 cycles
Read page off disk: 1000 cycles

Other overheads, such as running the scheduler or handling a page fault, need not be considered.

The simulator should execute one memory instruction per cycle. Context switches take 50 cycles during which no memory instructions may be executed. Replacing a page (evicting it from memory) is free, but bringing it back from disk takes 1000 cycles. During this time, other programs may execute.

You should experiment with total physical memory sizes of 50 pages, 500 pages, and 5000 pages.

You should experiment with quanta values of 10,000 cycles, 50,000 cycles, and 200,000 cycles.

You should run experiments with FIFO, 2nd-chance FIFO, and Clock page replacement algorithms.

Hints

Here is some linked list code that may be helpful for managing queues. Here is example code that uses it.
Create a PCB structure for each running process, that records statistics about that process and has an open handle to the relevant memory trace.
Write a disk simulator module that manages a queue of requests and remembers which process to wake up when they complete. The disk simulator only need to run when a request is issued or completed, so it should give the scheduling simulator the time at which it should next execute.

What to turn in

Turn in your code to the directory ~cs537-2/handin//P3 by the specified due date for the code. Email your project writeup directly to the instructors mailing list (instruct537-2@cs.wisc.edu) by the due date for the writeup.

Grading Policy

35% does it work -- we will run it on traces to make sure it computes the correct runtime and number of page faults.
20% code quality -- is the code clearly written, well structured and documented, and does it handle possible errors appropriately.
20% writeup -- do you clearly explain the design of your system, clearly present your performance results, and clearly explain the differences between different algorithms, memory sizes, and quantum size.
25% group grade -- did you work well with your group?

Last modified: Mon Nov 19 13:10:40 CST 2007