CS 736 – Spring 2006

RPC

Comments from reviews

Note: relevance not needed, instead: what is perf problem, what is perf solution
Base Paul: acking every packet sent – is it a performance problem?
Base Paul: not using other protocols, must redoit (e.g congestion control, flow control)

i. NOTE: at this time lower levels of software (e.g. PUP) not well optimized

ii. NOTE: bigger problem is lack of routing at lower level; only works for a single network with no routers

Base Paul: limited evaluation
Jon: big contribution is idea of RPC matching LPC sematnics
Jon: is stateless server a good idea?
Toshi: naming a big idea, secure communication built in
Everybody: didn’t measure performance with encryption turned on.

i. QUESTION: why not?

ii. ANSWER: cost is so high that you are only measuring encryption cost, not intrinsic cost of RPC model

Evan: small packets may not be common case

i. QUESTION: what was result?

ii. ANSWER: still the case today. Look at Windows, Unix

1. RPC used for DS queries, small request/response

2. Look at DNS queries

3. Mail messages were a lot smaller then (no mime/attachements)

Dave: RPC name service not that popular. People use DNS instead

i. QUESTION: can you do the same thing in DNS?

ii. ANSWER: essentially same model, but not dynamic. MS uses dynamic DNS as a name service for this in a limited way, and a DS in genereal – exact same model

Dave: not widely used on internet. See DNS, NTP, dynamic DNS. Couldn’t implement HTTP atop RPC

i. QUESTION: why not put HTTP on to of RPC?

ii. QUESTION: why not used on Internet?

1. A: failures common, RPC doesn’t deal with failures well

2. A: internet deals with heterogeneous systems. RPC requires code generator for packets, means porting software or implementing something big and general purpose when all you need in any once case is a small piece of work.

Kevin: why do you need naming?

i. A: replication for reliability. Indirection for flexibility – want to upgrade the machine, move service to another machine. Where do you get an IP address?

Context

Xerox Parc
Birth of local area networks, distributed computing

Problem

QUESTON: What problem were they solving?

i. Distributed programming

1. QUESTION: why important? improve performance by distributed code to different machines

ii. Hard to write distributed programs using messages

1. Like writing in ASM

iii. How do you make an efficient high-level communication mechanism?

1. Similar to using compiler instead of ASM, or scripting language instead of C

iv. Target environment: local area network, closely-coupled computation, generally reliable

Goal:

QUESTION: What was goal for this work?

i. Find the right paradigm for distributed computing

ii. Fine-tune the semantics

1. Make it as powerful as possible so don’t need to layer mechanism above it

iii. Implementation choices for efficiency

NOTE: want to let programmers reason about performance (unlike shared memory)

Rejected ideas

Remote fork – launch remote program that returns values

i. Still has problems of data & argumenet passing

Distributed shared memory

i. Difficult to make fast

ii. Hard to program – memory classes not exposed in language

QUESTION: Why RPC

Clean, simple semantics
Well understood to programmers
Commonly used already for structuring programs
QUESTION: Why only synchronous communication?

i. Is the common case

ii. Can use fork/join for asynchronous communication

Big picture

Show how RPC works

i. Client, client stub, runtime, server stub, server

ii. Name server

iii. IDL compiler - Lupine

Questions to solve

What should failures semantics be?
How do you handle pointer-based data structures?

i. Don’t allow

ii. Marshall automatically

How do you identify the target of a call?
What protocols should be used? Where in the stack should you sit (e.g. Ethernet, ip, udp, tcp)

Principles

Make RPC as much like procedure call as possible

i. No time-outs

ii. Return communication failures as exceptional conditions

1. QUESTION: What does this mean for RPC packages in C?

2. QUESTION: how does this impact programming?

a. New failure modes

b. Depends on whether programmers already handle exceptions

3. QUESTION: What should a program do on failure?

Stubs

Automatically generated
Look like normal procedure to client; hides distribution
Runtime can hide architectural differences

i. Convert between endian-ness

ii. Convert between pointer sizes

Binding

QUESTION: What is binding?
How do you specify someone to talk to?

i. Naming: type (interface name) and instance (host name / service name for replicated services)

ii. QUESTION: What do you want from naming?

How do you find someone that meets that specification

i. Contact a name service:

1. Grapevine

a. Entry for each type

i. Lists instances of the type

b. Entry for each instance

i. Addressing information for host

c. QUETSION: What about DNS?

i. DNS for mail services

ii. LDAP in Windows

How do you announce that you provide a service?

i. ExportInterface registers information with grapevine automatically when server starts up

ii. RPC runtime maintains a table mapping interface name to dispatch procedure & 32 bit instance/incarnation identifier (changes after reboot)

What do you do to initiate a conversation?

i. ImportInterface asks grapevine for addressing information (or uses provided name/address)

1. When several available, client runtime gets all, tries them in useful order

ii. Runtime on client RPC to server to receive binding association (unique identifier/incarnation number)

ISSUES:

i. Binding does not create state on server à scalable

ii. Bindings broken when server crashes à automatically informs client

iii. Access controls on grapevine limits who can register an interface

1. QUESTION: Should it limit who can import?

a. Can learn of imports other ways, e.g. port scanning

Protocol Implementation

QUESTION: What are goals:

i. Minimize latency of calls

ii. Minimize state needed on server for handling many clients

iii. Provide useful semantics:

1. On success, exactly one execution

2. On exception, zero or one execution

a. QUESTION: Why? Impossibility result

3. No timeouts

a. QUESTION: Good? Bad? What is user experience?

iv. Solution:

1. Optimize for common case:

a. Request & reply happen in a single packet

b. Reply takes less than a roundtrip of computation

2. Piggyback ack’s on subsequent packet

3. Leverage protocol properties

a. Only one outstanding request per client on an interface à no sliding window

b. Not need to establish connection; server just remembers highest # request from client to detect duplicates

c. Sender of data packet resends until ACKd, by next call or explicit (if call takes longer)

d. Server

v. Handle complex case simply

1. Multiple-packet request/reply explicitly ACK every non-terminating packet before sending next packet

a. Only last packet must be buffered on either side

b. Use other protocols for bulk transfer

vi. Avoid expensive process creation for handling requests

1. Server uses separate process / concurrent request (no threads)

2. Creates pool of processes to avoid expensive creation cost on call

3. Hints to client what process to request to use same process for all requests in a conversation

a. QUESTION: What are the implications? Each call independent? No state across calls? Servers must share shared dynamic state across processes?

Evaluation

QUESTION: what should be evaluated?

i. Complexity of using system

ii. Amount of code to solve a problem

iii. Fault tolerance

iv. Latency

v. Scalability / throughput / simultaneous clients

QUESTION: what is evaluated?

i. Performance of calls relative to procedure call and messaging latency

ii. What about compared to bare message passing?

Big Ideas

QUESTION: What were the key techniques?

i. Optimize for the common case

1. Avoid unnecessary complexity

2. Take advantage of communication pattern – e.g. synchronous request/reply

ii. Handle the uncommon case correctly & simply

iii. Re-use a known-good idea

iv. Layer of indirection

1. Stubs hide complexity, multiplexing

v. Scalability via stateless / reduced state

1. Server only holds one number per client

Commentary

RPC useful technique for loosely coupled distributed systems
Performance can be made quite high with optimized runtimes (see next week)
Failure semantics cause problems; callers often not prepared to deal well with failure

i. QUESTION: What should you do on failure? Retry ? How many times? How long should you wait?

Makes it almost as easy to build a system of proceses as one of a single process
Basis for distributed object systems like DCOM and RMI and XML-RPC
Problems

i. Procedure call level may be too low; message formats for internet protocols may encourage better separation between code and protocol

ii. Encourages synchronous round trips; hard to batch requests that can be overlapped

iii. Difficult to revise interfaces; is handled but leads to ugly code on server

iv. Generally language specific