2. mOS Socket Abstraction¶
The most important abstraction in mOS is the socket interface. We
reuse the mTCP socket (socket with BSD-like semantics) for this purpose.
Although functionally similar, the internal implementation of an mTCP socket
is still markedly differently from BSD’s. Due to this reason, we
pre-append all socket function names with mtcp_
keyword
(e.g. mtcp_socket()
). Our mOS socket library provides all
important routines that you would typically need when dealing
with BSD sockets (e.g. mtcp_getsockopt()
and
mtcp_setsockopt()
).
The socket descriptor space is local to each mOS thread where
each registered mTCP socket is associated with a thread context.
This allows parallel socket creation from multiple threads by
removing lock contention on the socket descriptor space. We also
relax the semantics of socket()
such that it returns any
available socket descriptor instead of the minimum
available fd. This reduces the overhead of finding the minimum
available fd descriptor.
mOS provides two types of sockets: (i) mTCP sockets (for endpoints), and (ii) monitoring sockets (for middleboxes). We briefly explain both types below.
2.1. mTCP sockets¶
mTCP sockets (socket type: MOS_SOCK_STREAM
) in mOS provide
reliable end-to-end communication between the client and the
server nodes. Users can build mOS server applications by calling
sequence of BSD-like socket functions (mtcp_socket()
,
mtcp_bind()
, mtcp_listen()
, mtcp_accept()
). We
provide similar set of functions for building client applications
(mtcp_socket()
, mtcp_connect()
) as well.
Like Berkeley sockets, the connections built on mOS sockets also have internal flow control and congestion control implementations.
2.2. Monitoring sockets¶
For monitoring, we extend our networking API to introduce a new socket
type called MOS_SOCK_MONITOR_STREAM
. We term MOS_SOCK_MONITOR_STREAM
sockets as simply stream monitoring sockets. Conceptually, a stream monitoring
socket abstracts a middlebox’s tap-point on a passing flow or packets.
A monitoring socket is similar to a Berkeley socket, but it differs in
its operating semantics. First, a stream monitoring socket represents
a non-terminating midpoint of an on-going TCP connection. With a stream
monitoring socket, one can closely follow the TCP state change
of both client and server without terminating a TCP connection explicitly.
Second, a monitoring socket can monitor fine-grained TCP-layer operations
while a stream Berkeley socket carries out coarse-grained, application-layer
operations. For example, a monitoring socket can detect TCP or packet-level
events such as abnormal packet retransmission, packet arrival order, abrupt
connection termination, employment of weird TCP/IP options, etc., while it
simultaneously supports reading flow-reassembled data from server or client.
Using the monitoring socket and its API functions, one can write a complex monitoring middlebox in a modular manner. First, a developer creates a ‘passive’ monitoring socket (similar to a listening socket) and binds it to a traffic scope, specified in a Berkeley packet filter (BPF). Only those flows/packets that fall into the scope are monitored.
Note that there is no notion of “accepting” a connection since a middlebox does not engage in a connection as an explicit endpoint. Instead, one can specify when custom operation should be triggered by registering for flow events as described in section mOS Event System. All one needs is to provide the event handlers that perform a custom middlebox logic, since the underlying mOS networking stack (or mOS stack) automatically detects and raises the events by managing the flow contexts. An event handler is passed along an ‘active’ monitoring socket that represents the flow triggering the event. Through the socket, one can probe further on the flow state or retrieve and even modify the last packet that raised the event.
A part of this text first appeared in a technical report.