CS 739 - Distributed Systems - Spring 2012

Overview

Google recently released cluster data from their production systems. The data is simply a log of jobs executed on clusters. With the data, researchers at Google posed a few questions that could be addressed. For the purposes of the project, the question you will look at is this one:

New algorithms for machine assignment: How can we assign tasks to machines so that we make best use of machine resources, avoid excess resource contention on machines, and manage power efficiently?

Details

Your task for this assignment is to build a simulator capable of processing the trace data, and then simulate different job scheduling policies. The data consists of a set of snapshots of a cluster, listing the time of the snapshot, process IDs, and the amount of CPU and memory currently being used by each job. There are about 3,500,000 entries in the trace. Because the data is a set of snapshots, 5 minutes apart, there is no information on how long a given job runs. There is also no data on the I/O usage of each job. Note that many of the jobs have zero CPU or zero memory usage because they are currently idle and/or swapped out.

Question 1: Building a balanced cluster

Google must design machine systems appropriate to executing their workload. You should evaluate the trace to look at the CPU and memory consumption, and figure out how to provision a cluster with machines in the most cost-effective way. You may assume that a core in the trace refers to a 2.2 GHz Core 2 processor, and that memory is given as a fraction of 8GB.

Your job is to figure out the minimum number of machines needed to execute this workload, and the cost of purchasing those machines. You may ignore additional costs such as reliability, power consumption, and the cost of networking.

You can configure and price machines at any vendor web site, such as Dell or HP.

Question 2: Job placement

The trace data includes both parent job ID and child job ID. When a parent job launches child jobs, locality between children is important so they can communicate results.

Assume that your ideal cluster configuration from above is created from racks of 20 machines, and that execution of a single job executes at the rate (1 + (fraction of child in my rack)) / 2. Thus, if all child jobs are in the same rack, they execute at full speed. If they are split into two racks each with half the children, they execute at 75%. If each child is in a different rack, then jobs execute at roughly 50% of their maximum speed. (I know this is unrealistically slow and the numbers may have to be tuned later).

Can you design an on-line job placement strategy so that when a parent creates a child job, a machine is selected to optimize job performance? For the purpose of this project, you should assume that jobs are created in the order specified in the trace file, at the time specified in the trace file. You may need to make further assumptions about job length, and if so should document that in your report.

Question 3: what else?

The Google cluster data does not contain much information in it. What other interesting question that can be answered/addressed with this data?

What to turn in