mOS Programming API ==================== We provide man pages that describe mOS API. You can access those pages at: https://mos.kaist.edu/index_man.html. mOS socket function prototypes are split into two files: `mos_api.h`_ and `mtcp_api.h`_. The reader is also suggested to go through the `Sample Applications User Guide`_ to see how the library can be used. We discuss a few important mOS API functions in detail. Initialization Routines ----------------------- .. code-block:: c /** * Initializes mOS base with parameters mentioned in config_file * * @param [in] config_file: path to the startup configuration file * @return 0 on success, -1 on error */ int mtcp_init(const char *config_file); .. note:: Make sure that you call ``mtcp_init()`` before launching the mOS context (``mtcp_create_context()``). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Loads current mOS configuration in conf structure * * @param [in] conf * @return 0 on success, -1 on error */ int mtcp_getconf(struct mtcp_conf *conf); `struct mtcp_conf`_ holds configuration parameters of mOS. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Updates mOS base with parameters mentioned in conf structure * * @param [in] conf * @return 0 on success, -1 on error */ int mtcp_setconf(struct mtcp_conf *conf); .. note:: Make sure that you call ``mtcp_setconf()`` before launching the mOS context (``mtcp_create_context()``). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /* * Creates the mOS/mTCP thread based on the parameters passed by * mtcp_init() & mtcp_setconf() functions * * @param [in] cpu: The CPU core ID where one wants to run the mOS thread. * @return mctx_t on success, NULL on error. */ mctx_t mtcp_create_context(int cpu); .. important:: Don't call mtcp_create_context() more than once for the same CPU id. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Destroy the context that was created by mOS/mTCP thread * * @param [in] mctx_t: mtcp context * @return 0 on success, -1 on error. */ int mtcp_destroy_context(mctx_t mctx); .. note:: This is usually called once the mOS thread is exiting. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Destroy the global mOS context * * @return 0 on success, -1 on error */ int mtcp_destroy(); .. note:: This is usually called once as the ``main()`` function exits. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mOS End TCP Socket API ----------------------- We focus more on the monitoring API and recommend the reader to refer to our `mTCP paper`_ for details on how to use the mTCP API. Please see the following links to view commentary for each function: * `mtcp_socket()`_ * `mtcp_bind()`_ * `mtcp_listen()`_ * `mtcp_accept()`_ * `mtcp_connect()`_ * `mtcp_read()`_ * `mtcp_write()`_ * `mtcp_close()`_ mOS Monitoring Socket API ------------------------- We next describe the operations and APIs with mOS monitoring socket. mOS callback APIs are used for registering/unregistering callbacks to any built-in/user-defined events. ``mtcp_register_callback()`` allows the users to register a callback function to all the accepted flows from a listening socket, or to register/unregister a callback function to/from a certain flow. The reader is recommended to go through this `code snippet`_ to see how a mOS application can be initialized. Please see the following links to view commentary for each function that is normally used in initializing the monitoring sockets: * `mtcp_socket()`_ * `mtcp_bind_monitor_filter()`_ * `mtcp_register_callback()`_ * `mtcp_define_event()`_ We provide two types of monitoring sockets: 1. Stream monitoring sockets (``MOS_SOCK_MONITOR_STREAM``) monitor traffic with complete flow semantics. They are capable of performing bytestream analysis and provide payload reassembly support. We use stream monitoring sockets in this `example code`_. .. figure:: images/sock_monitor_stream.* :scale: 50 % :align: center Figure 5.2 Functional diagram of a stream monitoring socket. 2. Raw monitoring sockets (``MOS_SOCK_MONITOR_RAW``) monitor traffic without flow semantics. All packets (irrespective of L3/L4 protocol ID) can be captured using this socket. .. figure:: images/sock_monitor_raw.* :scale: 50 % :align: center Figure 5.3 Functional diagram of a raw monitoring socket. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We provide a wide range of functions that a mOS developer can use to monitor packet context, maneuver the corresponding flow attributes, and even access the flow's bytestream. Packet Information Monitoring API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once inside the callback handler, we can use ``mtcp_getlastpkt()`` to retrieve packet information corresponding to the flow in context. .. code-block:: c /* * Fetch the most recent packet of the specified side for a given flow. * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring socket ID (represents flow in case of stream monitoring sockets) * @param [in] side: One of MOS_SIDE_CLI or MOS_SIDE_SVR (MOS_NULL for MOS_SOCK_MONITOR_RAW socket) * @param [in] p: point to pkt_info struct (only L2-L3 information is available for MOS_SOCK_MONITOR_RAW socket) * @return 0 on success, -1 on failure */ int mtcp_getlastpkt(mctx_t mctx, int sock, int side, struct pkt_info *p); The retrieved packet information is returned as ``struct pkt_info``, and is read-only by default. The followings are the member variables for the packet information structure with their description: .. code-block:: c struct pkt_info { uint32_t cur_ts; /*< time stamp when the packet is received */ int eth_len; /*< length of the ethernet header */ int ip_len; /*< length of the IP header */ uint64_t offset; /*< TCP ring buffer offset */ uint16_t payloadlen; /*< length of the TCP payload */ uint32_t seq; /*< host-order sequence number */ uint32_t ack_seq; /*< host-order acknowledge number */ uint16_t window; /*< TCP window size */ struct ethhdr *ethh; /*< ethernet header */ struct iphdr *iph; /*< IP header */ struct tcphdr *tcph; /*< TCP header */ uint8_t *payload; /*< TCP payload */ } .. note:: When you retrieve the packet information using mtcp_getlastpkt() via ``MOS_SOCK_MONITOR_RAW`` socket, you can only use up to L3 information. (``cur_ts``, ``eth_len``, ``ip_len``, ``ethh``, ``iph``) .. tip:: Packet information might be impossible to retrieve from some callback functions, because some built-in events are independent from packet reception. For example, ``MOS_ON_CONN_END`` event is triggered when a TCP flow at TIME_WAIT state destroys on timeout, and this event is independent from any packet. For those cases, since there is no packet triggered that event, mtcp_getlastpkt() call would return -1 without any packet information. .. attention:: A packet arriving in a mOS monitor gets stored as the flow's last packet on the **sender side**. This means that if an event is registered on ``MOS_HK_RCV`` hook, then the user must access *server* flow's last packet if the **client-side** triggers the callback. .. note:: For monitoring stream sockets, each connection stores a copy of the most recently observed packet (one from the client side, the other from the server). For raw sockets, only one copy of the packet is stored. Packet Payload Modification API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If modification on a received packet before forwarding is required, it can be done with ``mtcp_setlastpkt()``. Using ``mtcp_setlastpkt()`` call inside a callback function, you can modify the packet triggered by the callback event. The modifiable fields include Ethernet header, IP/TCP header, IP/TCP checksum, and TCP payload. It is also possible to drop the packet to avoid forwarding it (in inline configuration case) by putting ``MOS_DROP`` to the option field. .. Please note that packet modifications will work fine as long as the overall TCP segment length remains the same. .. note:: In inline configuration, the default policy for forwarding the packet is determined by the forward parameter in the ``mtcp_setconf()`` function. An ``mtcp_setlastpkt()`` call with MOS_DROP would override this policy. .. caution:: You can only modify packet contents at ``MOS_HK_SND`` hook (*i.e.* modification on ``MOS_HK_RCV`` is disallowed). This ensures that the simulated states of the sender and receiver in a ongoing connection are consistent with the TCP states of the actual endpoints. .. code-block:: c /** * Updates the Ethernet frame at a given offset across * datalen bytes. * * @param [in] mctx: mtcp context * @param [in] sock: monitoring socket * @param [in] side: monitoring side * (MOS_NULL for MOS_SOCK_MONITOR_RAW socket) * @param [in] offset: the offset from where the data needs to be written * @param [in] data: the data buffer that needs to be written * @param [in] datalen: the length of data that needs to be written * @param [in] option: disjunction of MOS_ETH_HDR, MOS_IP_HDR, MOS_TCP_HDR, * MOS_TCP_PAYLOAD, MOS_DROP_PKT, MOS_UPDATE_TCP_CHKSUM, * MOS_UPDATE_IP_CHKSUM * @return Returns 0 on success, -1 on failure * */ int mtcp_setlastpkt(mctx_t mctx, int sock, int side, off_t offset, byte *data, uint16_t datalen, int option); TCP Flow Information Monitoring API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Get flow information * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring socket id * @param [in] level: SOL_MONSOCKET (for monitoring purposes) * @param [in] optname: variable * @param [in] optval: value of retrieved optname * @param [in] optlen: size of optval * @return 0 on success, -1 on error */ int mtcp_getsockopt(mctx_t mctx, int sock, int level, int optname, void *optval, socklen_t *optlen); ``mtcp_getsockopt()`` helps retrieve flow information of a TCP flow. We can use the following options for ``optname``: * ``MOS_TCP_STATE_CLI`` or ``MOS_TCP_STATE_SVR`` returns the current emulated state of the client or server. The optval argument is a pointer to an int whereas the optlen argument contains the sizeof(int). The optval returns a value of type enum tcpstate which can carry any one of the following states. .. code-block:: c enum tcpstate { TCP_CLOSED = 0, TCP_LISTEN, TCP_SYN_SENT, TCP_SYN_RCVD, TCP_ESTABLISHED, TCP_FIN_WAIT_1, TCP_FINE_WAIT_2, TCP_CLOSE_WAIT, TCP_CLOSING, TCP_LAST_ACK, TCP_TIME_WAIT } * ``MOS_INFO_CLIBUF`` and ``MOS_INFO_SVRBUF`` returns meta-data regarding the client's or server's TCP ring buffer. This information is returned in the form of optval which is passed as struct tcp_buf_info. The optlen value contains the sizeof(struct tcp_buf_info). .. code-block:: c struct tcp_buf_info { /* The initial TCP sequence number of TCP ring buffer. */ uint32_t tcpbi_init_seq; /* TCP sequence number of the ’last byte of payload that has already been read by the end application’ (applies in the case of embedded monitor setup) */ uint32_t tcpbi_last_byte_read; /* TCP sequence number of the ’last byte of the payload that is currently buffered and needs to be read by the end application’ (applies in the case of embedded monitor setup). In case of standalone monitors, tcpbi_last_byte_read = tcpbi_next_byte_expected */ uint32_t tcpbi_next_byte_expected; /* TCP sequence number of the ’last byte of the payload that is currently stored’ in the TCP ring buffer. This value may be greater than tcpbi_next_byte_expected if packets arrive out of order. */ uint32_t tcpbi_last_byte_received; } * ``MOS_FRAGINFO_CLIBUF`` and ``MOS_FRAGINFO_SVRBUF`` gives back offsets to fragments (non-contiguous data segments) currently stored in client's or server's TCP ring buffer. The optval is an array of ``struct tcp_ring_fragment``. .. code-block:: c struct tcp_ring_fragment { uint64_t offset; uint32_t len; } TCP Flow Manipulation API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition to packet modification API, mOS provides TCP flow manipulation APIs for TCP monitoring sockets. .. code-block:: c /** * Set flow information * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring socket id * @param [in] level: SOL_MONSOCKET (for monitoring purposes) * @param [in] optname: variable * @param [in] optval: value of retrieved optname * @param [in] optlen: size of optval * @return 0 on success, -1 on error */ int mtcp_setsockopt(mctx_t mctx, int sock, int level, int optname, const void *optval, socklen_t optlen); * mtcp_setsockopt() can be used to manipulate TCP flows. We can configure flows using the following options for ``optname``: * ``MOS_CLIBUF`` or ``MOS_SVRBUF`` dynamically adjust the size of the TCP receive ring buffer of the emulated client or server stack. The optval contains the size of the buffer that needs to be set as int, while optlen is equal to sizeof(int). * ``MOS_CLIOVERLAP`` or ``MOS_CLIOVERLAP`` dynamically determine the policy on content overlap (e.g., overwriting with the retransmitted payload or not) for the client-side buffer. The optval can be either MOS_OVERLAP_POLICY_FIRST (to take the first data and never overwrite the buffer) or MOS_OVERLAP_POLICY_LAST (to always update the buffer with the last data), and optlen is equal to sizeof(int). * ``MOS_STOP_MON`` dynamically stop monitoring a flow for the specific side. This option can be used only with a MOS_SOCK_MONITOR_ACTIVE socket, which is given as a parameter in callback functions for every flow. The optval contains a side variable (``MOS_SIDE_CLI``, ``MOS_SIDE_SVR``, or ``MOS_SIDE_BOTH``), while optlen is equal to sizeof(int). .. note:: If your application does not require payload inspection, you can turn off the TCP ring buffer using ``MOS_CLIBUF`` and ``MOS_SVRBUF``. If there is any flow that you are not interested in any more, you can turn monitoring off for that flow using ``MOS_STOP_MON``. .. important:: In case you have ``MOS_ON_PKT_IN`` registered on ``MOS_HK_SND`` you will not be able to call ``mtcp_setsockopt()`` or ``mtcp_getsockopt()`` functions on the server (``MOS_SIDE_SVR``) side for the TCP SYN packet. Only the client TCP context has been created at that point in time. Bytestream Payload Monitoring API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Peek bytestream from the flow * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring stream socket * @param [in] side: monitoring side * @param [in] buf: user allocated character array * @param [in] len: length to be copied * @return # of bytes actually read on success, -1 on failure */ int mtcp_peek(mctx_t mctx, int sock, int side, char *buf, size_t len); It is possible to monitor bytestream through ``mtcp_peek()`` function call on a monitoring socket. ``mtcp_peek()`` copies the data from the buffer, and delivers it to the application. If an application registers a callback on ``MOS_ON_CONN_NEW_DATA`` event and calls ``mtcp_peek()`` inside the callback, it can read the available contiguous bytestream. .. code-block:: c /** * Peek with an offset. * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring stream socket * @param [in] side: monitoring side * @param [in] buf: user allocated character array * @param [in] count: length to be copied * @param [in] seq_num: byte offset of the TCP bytestream * @return # of bytes actually read on success, -1 on failure */ int mtcp_ppeek(mctx_t mctx, int sock, int side, char *buf, size_t count, off_t seq_num); In case the user requires payload data retrieved from a specific byte offset, ``mtcp_ppeek()`` can be used. Unlike ``mtcp_peek()``, users are expected to specify a stream offset (``seq_num``) which always starts from 0 (0 internally gets translated to initial sequence number). In other words, ``seq_num`` can be considered as the absolute bytestream offset of the flow. If the requested offset is out of range of the receive buffer it returns an error. .. note:: The bytestream offset starts from zero (0) for every connection (*i.e.* the initial sequence number gets mapped to offset 0). The byte stream offset is 64-byte integer value, and we assume that it will never be wrapped around. .. important:: While we maintain both sides of the stack, we only provide receiver bytestream management (*i.e.* the send buffer bytestream management is not implemented). A developer interested in retrieving the bytestream of the sender can access the *receive* buffer of the peer flow instead (*e.g.* the send buffer of a client is functionally equivalent to the receive buffer of the server). User-level Flow Context API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: c /** * Set per-flow user context * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring stream socket * @param [in] uctx: per-flow user context */ void mtcp_set_uctx(mctx_t mctx, int sock, void *uctx); /** * Get per-flow user context * * @param [in] mctx: mTCP/mOS context * @param [in] sock: monitoring stream socket * @return per-flow user context */ void * mtcp_get_uctx(mctx_t mctx, int sock); mOS allows user to store and retrieve per-flow context. By using this API, user applications can store per-flow metadata and use it any time during the entirety of the flow's lifetime in mOS. .. note:: It is the user's responsibility to manage memory that is allocated for the per-flow user context. Please remember to free up any memory that is allocated for the context when the flow is about to be destroyed. It is useful to always register for ``MOS_ON_CONN_END`` callback where memory can be freed. .. _`mos_api.h`: http://www.ndsl.kaist.edu/mos_api/mos__api_8h.html .. _`mtcp_api.h`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html .. _`Sample Applications User Guide`: ../samples/index.html .. _`mTCP paper`: http://www.ndsl.kaist.edu/~kyoungsoo/papers/mtcp.pdf .. _`struct mtcp_conf`: http://www.ndsl.kaist.edu/mos_api/structmtcp__conf.html .. _`mctx_t`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a2787890e8d715765a686f2ff4db1874a .. _`mtcp_socket()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a961823fabc4872d161b6cbcd7b653736 .. _`mtcp_bind()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a6c2f92a99cfbf22da4f99c6541fd07e4 .. _`mtcp_listen()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a4b8cede4a8252b607966bc9b0896d0fe .. _`mtcp_accept()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a13b8338c7c70d45a6e456accc65cd31c .. _`mtcp_connect()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a48a42efb04b7065fdad45e223e7a9cd6 .. _`mtcp_read()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a4d47672dee23bda2df19b59d56ce40b3 .. _`mtcp_write()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a599fa15feb8e8681e22a3dd7dac720ee .. _`mtcp_close()`: http://www.ndsl.kaist.edu/mos_api/mtcp__api_8h.html#a73a9775eac036cebadebf04cd606646c .. _`code snippet`: http://www.ndsl.kaist.edu/mos_guide/programmer/03_event_system.html#example-initialization-sequence .. _`example code`: http://www.ndsl.kaist.edu/mos_guide/programmer/03_event_system.html#example-initialization-sequence .. _`mtcp_bind_monitor_filter()`: http://www.ndsl.kaist.edu/mos_api/mos__api_8h.html#ae543660aa2d403098a859109e76e4d77 .. _`mtcp_register_callback()`: http://www.ndsl.kaist.edu/mos_api/mos__api_8h.html#a5945e261099bac0039d6e49ad652f178 .. _`mtcp_define_event()`: http://www.ndsl.kaist.edu/mos_api/mos__api_8h.html#a76eea0045ac9a91edaa523e95839bf6f .. _`enum tcpstate`: http://www.ndsl.kaist.edu/mos_api/mos__api_8h.html#a85786300e410493fd5539aa86ed2e105 .. _`struct tcp_buf_info`: http://www.ndsl.kaist.edu/mos_api/structtcp__buf__info.html .. _`struct tcp_ring_fragment`: http://www.ndsl.kaist.edu/mos_api/structtcp__ring__fragment.html