CS 736 – Spring 2006
Lecture 20: Performance Summary
1. General Problems
a. Latency: do things faster
i. E.g. RPC turn around
b. Throughput
i. Handle more requests/operations per second
c. Time to completion
i. How long does it take to compute a fixed workload? E.g. sort a billion values
d. Scale up: run faster on faster machines – e.g. giant multiprocessors with lots of memory and fast CPUs
i. Improve speed on faster/more computers
ii. Run well on a supercomputer
e. Scale out: run on bigger data sets on more machines
i. Handle more data on more computers / faster computers
ii. Run well on a cluster with a billion clients
f. Predictability
i. Does computer do as you expect? Is performance predictable, understandable, low variance? If there is a problem, can you understand its source?
g. Fairness
i. E.g. proportionally share a resource
h. Efficiency
i. Reduce the amount of CPU/bandwidth/storage it takes to do something, even if it isnŐt the bottleneck
ii. Frees resources for something else
i. Overload
i. How does performance vary with load? Keep it even
2. General Solutions
a. Locality
i. FFS cylinder groups
ii. LFS logs
b. Optimize for common case
i. LRPC
c. Match underlying functionality
i. ActiveMessages: hardware messing
ii. Scheduler Activations: scheduling decisions
iii. Grapevine Naming – shows whether is user or group
d. Hints - Semantically irrelevant but useful performance-wise if correct
i. Pilot page usage
e. Partitioning – distribute load to multiple servers
i. Grapevine
ii. AFS
iii. Petal
iv. Frangipani
f. Replication – more read throughput
i. Grapevine
ii. AFS
iii. Petal
g. Caching
i. Grapevine – group membership
ii. AFS
iii. NFS
h. Change data structures
i. Logging in LFS, Petal, Frangipani
ii. Message lists in Grapevine
iii. Free block bitmap in FFS
iv. A-Stack / E-stack in LRPC
v. Group membership in Grapevine
i. Batching – reduce startup costs
i. Delayed write in LFS / Frangipani, NFS
j. Randomization for fairness
i. Lottery Scheduling
k. Idempotent operations / stateless operation
i. Message delivery in Grapevine
ii. NFS everything
l. Callbacks / leases – reduce server load
i. AFS
ii. Frangipani
m. Move work to client
i. AFS name translation
n. Early binding
i. NFS mounting
ii. LRPC binding / compiler-generated stubs
o. Asynchronous operation
i. Active Messages
p. Notifications – notify other participant of semantically interesting events
i. Scheduler Activations
q. Move control from OS to user code
i. Scheduler activations
ii. Active Messages
r. Delay work
i. LFS – segment cleaning
s. Multi-level policy
i. FFS global/local placement
ii. Petal global / physical maps
3. Evaluation Techniques
a. Questions to ask
i. When should they be used
ii. What do they show
b. Micro benchmarks
i. Used to understand performance problems – where is the speedup / slowdown / problem coming from
ii. E.g.
1. Null RPC
2. Contention
3. Read 8 / 64 / 1000 kb files
c. Synthetic benchmarks
i. Non-representative:
1. Andrew Benchmark
2. Shows higher level performance, more realistic mix of operations
3. Again, used to understand performance, indicate potential problems due to workload skew
d. Record live performance
i. Shows operational issues, not peak load
1. CPU utilization
e. Perform anomalous events, e.g. shutdown server
i. Show response under duress (e.g. time to reconfigure, time to clean)
f. Comparisons
i. Against best research system (LRPC, Scheduler Activations)
ii. Against industry practice (AFS, FFS, LFS)
iii. Against tuned industry practices (Petal, Frangipani, ActiveMessages)
g. Papers
i. LRPC:
1. NULL RPC + component timings
2. Throughput scaling on multiprocessor with simple workload
3. Compare to TAOS RPC
ii. Scheduler Activations
1. Micro benchmarks: null fork, wait
2. Scalability with # of processors on single program
3. Compare to unix threads, topaz fast threads, user level threads
iii. Active Messages
1. Null RPC + timings
2. Utilization as # of processors scale
3. Compare to native buffered model
iv. Lottery Scheduling
1. Proportionality with sharing under different simple (small # of processes) workloads)
v. FFS
1. Read / write bandwidth, CPU utilization on simple workloads
2. Compare to UFS
vi. LFS
1. Synthetic, fixed workloads (e.g. uniform, fixed) to show response to different patterns
2. Micro benchmarks for create/read/delete with sequential and random access
3. Usage characteristics from live system
4. Compare to FFS
vii. AFS
1. Usage characteristics from live usage
2. Andrew benchmark – time + scalability as client load increases
3. Access latency for different size files
4. Compare to NFS, local
viii. NFS
1. Compare to local, network disk
2. Run real programs
ix. Petal
1. Compare to local, tuned industry FS
2. Synthetic read / write workload
3. Mesaure latency, scalability with # of servers
4. Andrew benchmark
x. Frangipani
1. Compare to local w/ tuned industry FS
2. Andre wbenchmark
3. Synthetic read/write microbench
4.
Scaling on microbench to
understand perf