U-net

 

Comments:

-       fairness not considered: BUT COULD IT BE HANDLED?

-       Hard to enforce other properties – e.g. TCP friendliness

 

 

Problem: high latency, low bandwidth small messages on high-speed networks. Inflexible protocols in kernel

 

-       on low latency network, per-packet processing at endpoints is significant fraction of communication cost

-       Low latency important for small messages

o      Distributed objects (CORBA, DCOM)

o      Distributed services: e.g. web page pulling data from multiple sources

o      Cache consistency messages

o      Reliability protocols (e.g. keep alives)

o      RPC systems

o      NFS – lots of small messages

-       Flexibility important

o      Can use application-level framing: make sure that data is sent in application units, not packet sized units: makes it possible for receiver to act on each packet independently, because contains an application-interpretable piece of data (minimizes latency to process data) – can be processed out of order

o      Can use integrated layer processing: can integrate processing of all layers, rather than separating via queues or sequential calls.

 

 

Solution:

QUESTION: what is the high level solution?

ANSWER: change the interface

 

Look at a standard network stack: send(), recv(), select()

 

-       App calls send()

-       Kernel allocates kernel buffer

-       kernel copies data to buffer

o      QUESTION: why?

¤       ANSWER: buffer in VM, may get paged out, deallocated causing page fault

o      Multiplexes all senders into a queue of packets to send

-       kernel calls driver

-       driver passes pointer to nic

-       nic copies data from memory

 

-       Device interrupts

-       driver sends list of packets to kernel

o      all packets on a single queue

-       kernel demultiplexes onto different sockets

-       kernel notifies appropriate application to run

-       scheduler context switches to app (at some point) if blocked OR

-       App polls kernel to receive data

-       Kernel copies data to client buffer

o      QUESTION: why?

o      ANSWER: kernel buffers data until client calls

o      ANSWER: client buffers in VM, could be paged out or not page-aligned

 

Problems:

-       extra data copies

-       demultiplexing happens late (not during interrupt)

-       scheduling or kernel call required to receive a packet and send a packet

 

U-Net ideas:

 

-       move (de)multiplexing out of kernel and into network interface

o      Allows nic to give packets right to processes, to get packets right from processes

-       Question: what are the interfaces?

 

Set up a endpoint:

 

-       call kernel to create endpoint

-       Kernel allocates a communication segment (memory region for packets), send/recv/free queues, a descriptor (filter of packet contents) that point into the segment

o      All addresses are offsets: allows placement anywhere in VM, quick validation of addresses to see if in range

-       Kernel does any validation – e.g. is this endpoint allowed

-       Kernel notifes NIC of endpoint, a filter of packets that come & go to NIC

-       KEY POINT: packets are now in shared memory, not in a buffer

-       KEY POINT: interfaces is now a set of queues, not an API

-       QUESTION: why important?

o      ANSWER: not need to enter kernel to communicate; device can poll these locations

o      ANSWER: easily allows multiple outstanding requests: no blocking calls

-       BENEFITS:

o      Application has direct feedback on network conditions

o      Sending too fast: send queue full

o      Receiving too slow: receive queue full, overflowing

-       QUESTION: why 3 queues and not 2 ring buffers?

o      ANSWER: better memory utilization, can share buffers between send and receive

 

ATM background:

-       virtual circuits

o      set up a circuit through network by programming forwarders

o      Forwarders take as input a channel number, determine output port and output channel number

-       53 byte cells w/ 5 byte header

o      Can multiplex lots of low-bandwidth, low jitter channels, e.g. voice

-       compromise of networking world a phone world

-       great for computer science – lots of things to be done, problems to be solved

-       superceded by gig-E

-        

Implementation on SBA-200

 

-       segments pinned in physical memory – can receive right into them

-       queues mapped into device memory – NIC can poll them without crossing bus

o      QUESTION: is there a safety issue?

o      ANSWER: Maybe, if multiple processes have a queue on a single page

-       Packet send / receive done using Poll

o      QUESTION: isnŐt polling bad?

o      ANSWER: if have a dedicated processor doing nothing else, has low overhead and low latency. interrupts require signaling, waking people up

-       Optimizations for single cell communication:

o      Place direction in queue rather than in a separate buffer

-       Longer communication:

o      fixed size buffers pulled off the free queue, appended to receive queue

 

Uses:

 

-       New programming models: split/c and active messages

-       Implement standard protocols:

o      ip address / port mapped to a virtual path identifier / virtual circuit identifier on that path

-       Tuned user-level implementations of UDP/TCP

o      Can use research version independent of OS

o      Lower latencies may make performance more consistent

o      avoid copies necessary previously

 

 

High level point:

-       Interfaces often drive maximum achievable performance. E.g. if the interface requires kernel transitions, data copies

-       Shared memory interfaces can avoid copies, substitute polling for interrupts

-       Tricky part: getting the security demands of kernel calls with shared memory

o      Must make sure can only send packets with correct formatting, source information

o      Only receive packets with correct source/destination information

o      Solution: specify this during setup, then let NIC enforce and kernel gets out of the way.