LRPC
i. ANSWER: look forward at what it could be used for, e.g. system calls?
i. COMMENT: a good evaluation explains things well, but leaves you wanting more because you are interested. Nobody evaluates everything. A bad evaluation leaves you confused and wanting more because you didnŐt understand things.
i. Comment: how much more general could it be? Often there is a natural performance cliff where the next piece of complexity causes a big performance hit.
i. answer: it can be a table index, just like a file descriptor
ii.
i. Analyze a system
ii. Find an untapped opportunity; some common behavior that can be optimized
1. E.g. small arguments
2. Fixed size arguments
3. Unstructed arguments (e.g. buffers vs. types needing marshalling)
4. Unnecessary optimizations (e.g. copying data)
iii. Measure overhead that you could remove; best case performance
iv. Build an optimized version that takes advantage of the opportunity
v. Go on to fame and fortune
i. RPC used for structuring systems:
1. Client / server (e.g. Windows services, name server)
2. NFS file server – used for sending requests to server
ii. Common case is not remote / large arguments
1. Common case is local calls when used in systems with micro-kernels (1-6%)
2. Common case is small, fixed-size arguments
a. 60% were < 32 bytes
b. 80% of arguments fixed size at compile time
c. 2/3 procedures have fixed-size arguments
i. Do everything in advance
1. e.g. allocating stacks
2. e.g. setting up dispatch (no dynamic dispatch in server)
ii. Remove unnecessary copies
1. Memory copies huge cause of performance problems
i. Bind
1. Create procedure description list with each exported procedure
2. Allocate shared a-stacks and corresponding linkage records (for callerŐs return address) for each procedure
a. QUESTION: Why allocate stacks for all procedures?
i. ANSWER: want contiguity for easy range checking
3. Return binding object to client runtime to identify binding
4. KEY POINT: binding object contains server function address – no need for dispatch in server
ii. Call
1. QUESTION: How do they know if call is local or remote?
a. A: at bind time, cache a bit of information
i. Client stub grabs A-stack off queue
ii. Push arguments on A-stack
iii. Pass a-stack, binding object, procedure identifier in registers to kernel
iv. Kernel
1. Verify binding, procedure identifier
2. Locates procedure description
3. Verify A-stack & locate linkage for A-stack
4. Verify ownership of A-stack
5. Record callerŐs return address in linkage
6. Push linkage onto thread (so can nest calls)
7. Find execution stack (e-stack) for server to execute (from pool)
8. Update thread to point at E-stack
9. Change processor address space
10. Call into server stub at address in PD
v. Notes:
1. Can use separate argument stack because language supports in, C would need to copy arguments to E stack
a. What else could you do? Put a-stack/e-stack on attached pages
2. By-reference objects are copied to A stack by client stub
a. PRINCIPLE: client does copying work
b. PRINCIPLE: client stub does work, kernel verifies (e.g. choose A-stack)
c. QUESTION: What are alternatives to having client stub do copying?
3. What about thread-local storage in server? Or thread-init routines for DLLs?
vi. QUESTION: What about writing with shared memory?
1. A: no isolation
i. Normal RPC makes copy of arguments
1. Many times – up to 4 times
ii. QUESTION: What is benefit?
1. Ensures COW semantics; client changes canŐt corrupt server
iii. LRPC uses shared stacks accessible to both processes
1. Client can overwrite A-stack while server access it
iv. Solution:
1. Server can copy/verify data only if needed
2. Not needed for opaque (e.g. buffers) parameters
3. Server can integrate validity checks with copying
4. Adds at most one extra copy (on top of initial 1)
5. COMMENT: More like a system call, where kernel validates parameters
6. ISSUE: Complicates server
i. What do you do if a server thread crashes?
ii. Question: what is the key problem?
1. A: client thread has been taken over for the server, canŐt just timeout because server is actively using it
iii. Solution: duplicate client thread state into a new thread
i. In this paper: didnŐt which pieces of RPC were bad
ii. Showing the minimum possible gets around this from the other direction
i. Assumes no per-thread application state
ii. Relies on argument stack pointer to avoid copying / changing protection on execution stack
i. Dave Cutler drove from MS over to UWash for a meeting
ii. Windows version different
1. No shared stacks
2. Pre-allocated shared memory if large objects needed
3. Handoff scheduling for low latency
4. Still have to copy messages many times
a. Into user-mode message
b. Into kernel mode message
c. Into server message
d. Onto server stack
5. Quick LPC:
a. Dedicated server thread
b. Dedicated shared memory with server thread
c. Event pair for signaling message arriving / reply arriving
i. Systems are never fast enough
ii. If code called frequently, always the temptation to move code into the kernel