U-net

Comments:

- fairness not considered: BUT COULD IT BE HANDLED?

- Hard to enforce other properties – e.g. TCP friendliness

Problem: high latency, low bandwidth small messages on high-speed networks. Inflexible protocols in kernel

- on low latency network, per-packet processing at endpoints is significant fraction of communication cost

- Low latency important for small messages

o Distributed objects (CORBA, DCOM)

o Distributed services: e.g. web page pulling data from multiple sources

o Cache consistency messages

o Reliability protocols (e.g. keep alives)

o RPC systems

o NFS – lots of small messages

- Flexibility important

o Can use application-level framing: make sure that data is sent in application units, not packet sized units: makes it possible for receiver to act on each packet independently, because contains an application-interpretable piece of data (minimizes latency to process data) – can be processed out of order

o Can use integrated layer processing: can integrate processing of all layers, rather than separating via queues or sequential calls.

Solution:

QUESTION: what is the high level solution?

ANSWER: change the interface

Look at a standard network stack: send(), recv(), select()

- App calls send()

- Kernel allocates kernel buffer

- kernel copies data to buffer

o QUESTION: why?

§ ANSWER: buffer in VM, may get paged out, deallocated causing page fault

o Multiplexes all senders into a queue of packets to send

- kernel calls driver

- driver passes pointer to nic

- nic copies data from memory

- Device interrupts

- driver sends list of packets to kernel

o all packets on a single queue

- kernel demultiplexes onto different sockets

- kernel notifies appropriate application to run

- scheduler context switches to app (at some point) if blocked OR

- App polls kernel to receive data

- Kernel copies data to client buffer

o QUESTION: why?

o ANSWER: kernel buffers data until client calls

o ANSWER: client buffers in VM, could be paged out or not page-aligned

Problems:

- extra data copies

- demultiplexing happens late (not during interrupt)

- scheduling or kernel call required to receive a packet and send a packet

U-Net ideas:

- move (de)multiplexing out of kernel and into network interface

o Allows nic to give packets right to processes, to get packets right from processes

- Question: what are the interfaces?

Set up a endpoint:

- call kernel to create endpoint

- Kernel allocates a communication segment (memory region for packets), send/recv/free queues, a descriptor (filter of packet contents) that point into the segment

o All addresses are offsets: allows placement anywhere in VM, quick validation of addresses to see if in range

- Kernel does any validation – e.g. is this endpoint allowed

- Kernel notifes NIC of endpoint, a filter of packets that come & go to NIC

- KEY POINT: packets are now in shared memory, not in a buffer

- KEY POINT: interfaces is now a set of queues, not an API

- QUESTION: why important?

o ANSWER: not need to enter kernel to communicate; device can poll these locations

o ANSWER: easily allows multiple outstanding requests: no blocking calls

- BENEFITS:

o Application has direct feedback on network conditions

o Sending too fast: send queue full

o Receiving too slow: receive queue full, overflowing

- QUESTION: why 3 queues and not 2 ring buffers?

o ANSWER: better memory utilization, can share buffers between send and receive

ATM background:

- virtual circuits

o set up a circuit through network by programming forwarders

o Forwarders take as input a channel number, determine output port and output channel number

- 53 byte cells w/ 5 byte header

o Can multiplex lots of low-bandwidth, low jitter channels, e.g. voice

- compromise of networking world a phone world

- great for computer science – lots of things to be done, problems to be solved

- superceded by gig-E

Implementation on SBA-200

- segments pinned in physical memory – can receive right into them

- queues mapped into device memory – NIC can poll them without crossing bus

o QUESTION: is there a safety issue?

o ANSWER: Maybe, if multiple processes have a queue on a single page

- Packet send / receive done using Poll

o QUESTION: isn’t polling bad?

o ANSWER: if have a dedicated processor doing nothing else, has low overhead and low latency. interrupts require signaling, waking people up

- Optimizations for single cell communication:

o Place direction in queue rather than in a separate buffer

- Longer communication:

o fixed size buffers pulled off the free queue, appended to receive queue

Uses:

- New programming models: split/c and active messages

- Implement standard protocols:

o ip address / port mapped to a virtual path identifier / virtual circuit identifier on that path

- Tuned user-level implementations of UDP/TCP

o Can use research version independent of OS

o Lower latencies may make performance more consistent

o avoid copies necessary previously

High level point:

- Interfaces often drive maximum achievable performance. E.g. if the interface requires kernel transitions, data copies

- Shared memory interfaces can avoid copies, substitute polling for interrupts

- Tricky part: getting the security demands of kernel calls with shared memory

o Must make sure can only send packets with correct formatting, source information

o Only receive packets with correct source/destination information

o Solution: specify this during setup, then let NIC enforce and kernel gets out of the way.