Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8203314

Adding support for rsockets

XMLWordPrintable

    • Icon: JEP JEP
    • Resolution: Withdrawn
    • Icon: P3 P3
    • None
    • core-libs
    • None
    • Yingqi Lu
    • Feature
    • Open
    • JDK
    • M
    • M

      Summary

      Add rsocket support into JDK to improve throughput and latency of socket based network communication.

      Motivation

      For HPC and cloud applications, fully utilizing networking hardware capabilities to reach maximum bandwidth at low latency is challenging. Networking libraries inside JDK are currently based on OS kernel socket. Multiple memory copies between user and kernel spaces are involved during data transfers which result in extra memory bandwidth and CPU cycle consumptions. To improve this, we propose to add rsocket, a protocol over Remote Direct Memory Access (RDMA).

      Description

      In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. RDMA supports zero-copy networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer. – Wikipedia [1]

      rsocket is a protocol over RDMA that supports socket-level API for applications. It is intended to match the behavior of corresponding socket calls. rsocket functions match the name and function signature of socket calls, with the exception that all function calls are prefixed with an 'r' [2]. For example, to create a socket and return a file descriptor,

      default socket call: int socket(int domain, int type, int protocol);
      rsocket function: int rsocket(int domain, int type, int protocol);

      Currently, following rsocket functions are supported: rsocket, rbind, rlisten, raccept, rconnect, rshutdown, rclose, rrecv, rrecvfrom, rrecvmsg, rread, rreadv, rsend, rsendto, rsendmsg, rwrite, rwritev, rpoll, rselect, rgetpeername, rgetsockname, rsetsockopt, rgetsockopt and rfcntl [2].

      Given current JDK networking libraries are built with socket-level API, we believe rsocket is a good fit for enabling RDMA on both traditional sockets and non-blocking socket channels. Below is the list of proposed public APIs and non-public classes.

      Public APIs proposed for RDMA based sockets

      Module name: jdk.net; Package name: jdk.net
      
      jdk.net.Sockets.openRdmaSocket(), return java.net.Socket
      
      jdk.net.Sockets.openRdmaServerSocket(), return java.net.ServerSocket

      Non-public classes:

      Module name: jdk.net; Package name: rdma.ch
      
      RdmaSocketImpl/RdmaSocketImpl.PlatformRdmaSocketImpl: RdmaSocketImpl is a subclass of java.net.SocketImpl. It is the implementation for 
      RDMA based socket and server socket. When jdk.net.openRdmaSocket/jdk.net.openRdmaServerSocket is invoked, a new instance of RdmaSocketImpl 
      gets created. The newly created impl will be used to create a socket/server socket. RdmaSocketImpl has a static inner class 
      RdmaSocketImpl.PlatformRdmaSocketImpl
      
      LinuxRdmaSocketImpl: a subclass of RdmaSocketImpl.PlatformRdmaSocketImpl for Linux OS
      
      RdmaSocketInputStream/RdmaSocketOutputStream: subclasses of java.io.FileInputStream/java.io.FileOutputStream, 
      handling rsocket specific IO operations
      
      RdmaSocketOptions/RdmaSocketOptions.PlatformRdmaSocketOptions: In addition to the supported standard socket options, 
      rsocket has three additional options: RDMA_SQSIZE, RDMA_RQSIZE and RDMA_INLINE [2]. This class is created for set/get 
      rsocket specific options. RdmaSocketOptions has an inner class for RdmaSocketOptions.PlatformRdmaSocketOptions
      
      LinuxRdmaSocketOptions: a subclass of RdmaSocketOptions.PlatformRdmaSocketOptions

      The class diagrams are shown in Figure 1.

      Public APIs proposed for RDMA based socket channels:

      Module name: jdk.net; Package name: jdk.net
      
      jdk.net.Sockets.openRdmaSocketChannel(), return java.nio.channels.SocketChannel
      
      jdk.net.Sockets.openRdmaServerSocketChannel(), return java.nio.channels.ServerSocketChannel
      
      jdk.net.Sockets.openRdmaSelector(), return java.nio.channels.Selector

      Non-public classes:

      Module name: jdk.net; Package name: rdma.ch
      
      RdmaSocketChannelImpl: a subclass of java.nio.channels.SocketChannel that defines the implementations of RDMA channel operations 
      such as connect, read and write
      
      RdmaServerSocketChannelImpl: a subclass of java.nio.channels.ServerSocketChannel that defines the implementations of RDMA 
      server channel operations such as bind and accept
      
      RdmaSocketAdaptor: a subclass of java.net.Socket. It gets created from RdmaSocketChannelImpl to make an RDMA socket channel looks 
      like an RDMA socket
      
      RdmaServerSocketAdaptor: a subclass of java.net.ServerSocket. It gets created from RdmaServerSocketChannelImpl to make an RDMA 
      server socket channel looks like an RDMA server socket 
      
      RdmaPollSelectorProvider: a subclass of sun.nio.ch.PollSelectorProvider for RDMA based socket channels. 
      When jdk.net.Sockets.openRdmaSelector() is invoked, RdmaPollSelectorProvider.provider().openSelector() is called internally 
      and a new instance of RdmaPollSelectorImpl is returned
      
      RdmaPollSelectorImpl: a subclass of sun.nio.ch.PollSelectorImpl. It is the implementation of RdmaPollSelectorProvider 
      for RDMA based socket channels
      
      RdmaSocketDispatcher/RdmaSocketDispatcher.PlatformRdmaSocketDispatcher: RdmaSocketDispatcher is a subclass of sun.nio.ch.SocketDispatcher. 
      It does majority of the RDMA based socket channel IO operations. It has a static inner class PlatformRdmaSocketDispatcher 
      
      LinuxRdmaSocketDispatcher: a subclass of RdmaSocketDispatcher.PlatformRdmaSocketDispatcher for Linux OS
      
      RdmaNet: a subclass of sun.nio.ch.Net for RDMA based socket channel operations such as listen, bind and setSocketOption/getSocketOption

      The class diagrams are shown in Figure 2.

      Testing

      1. Functional testing on both RDMA based sockets and RDMA based non-blocking socket channels.

      2. CPU usage profiling with and without the feature to ensure CPU consumption is reduced, especially in kernel space.

      Alternative

      1. Socket Direct Protocol (SDP) [3] is an alternative approach to enable RDMA for networking. It has been released with JDK1.7. However, SDP kernel support libsdp has been deprecated from Open Fabric Enterprise Edition (OFED) version 3.5 (February 2013) [4]. rsocket was introduced in April 2012 to OFED as a successor to SDP. Specifically to Linux, rsocket support has been part of the kernel distribution too (no need to download and install from OFED).

      2. Another alternative approach is to use LD_PRELOAD with librspreload library, which is part of librdmacm [2]. When using this approach, all the system socket calls are intercepted with rsocket calls provided by the library. This does not provide the flexibility of having both regular socket operations and RDMA socket operations in the same application.

      Risks and Assumptions

      1. rsocket is currently only available on Linux. The assumption is the RDMA verbs transport library is pre-installed on the OS.

      2. IPv4 and IPv6 incompatibility. Similar to SDP, rsocket does not work with IPv6-mapped-IPv4 addresses [5]. -Djava.net.preferIPv4Stack=true is needed to run applications.

      3. rsocket does not currently have support for EPoll equivalent capability. rpoll is used instead.

      References

      [1] https://en.wikipedia.org/wiki/Remote_direct_memory_access

      [2] https://linux.die.net/man/7/rsocket

      [3] https://docs.oracle.com/javase/tutorial/sdp/sockets/index.html

      [4] https://openfabrics.org/downloads/OFED/release_notes/OFED_3.5_release_notes

      [5] https://docs.oracle.com/javase/tutorial/sdp/sockets/issues.html

            Unassigned Unassigned
            ylu Yingqi Lu (Inactive)
            Yingqi Lu Yingqi Lu (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: