Fix Version/s: None
Add rsocket support into JDK to improve throughput and latency of socket based network communication.
For HPC and cloud applications, fully utilizing networking hardware capabilities to reach maximum bandwidth at low latency is challenging. Networking libraries inside JDK are currently based on OS kernel socket. Multiple memory copies between user and kernel spaces are involved during data transfers which result in extra memory bandwidth and CPU cycle consumptions. To improve this, we propose to add rsocket, a protocol over Remote Direct Memory Access (RDMA).
In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. RDMA supports zero-copy networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer. – Wikipedia 
rsocket is a protocol over RDMA that supports socket-level API for applications. It is intended to match the behavior of corresponding socket calls. rsocket functions match the name and function signature of socket calls, with the exception that all function calls are prefixed with an 'r' . For example, to create a socket and return a file descriptor,
default socket call: int socket(int domain, int type, int protocol); rsocket function: int rsocket(int domain, int type, int protocol);
Currently, following rsocket functions are supported: rsocket, rbind, rlisten, raccept, rconnect, rshutdown, rclose, rrecv, rrecvfrom, rrecvmsg, rread, rreadv, rsend, rsendto, rsendmsg, rwrite, rwritev, rpoll, rselect, rgetpeername, rgetsockname, rsetsockopt, rgetsockopt and rfcntl .
Given current JDK networking libraries are built with socket-level API, we believe rsocket is a good fit for enabling RDMA on both traditional sockets and non-blocking socket channels. Below is the list of proposed public APIs and non-public classes.
Public APIs proposed for RDMA based sockets
Module name: jdk.net; Package name: jdk.net jdk.net.Sockets.openRdmaSocket(), return java.net.Socket jdk.net.Sockets.openRdmaServerSocket(), return java.net.ServerSocket
Module name: jdk.net; Package name: rdma.ch RdmaSocketImpl/RdmaSocketImpl.PlatformRdmaSocketImpl: RdmaSocketImpl is a subclass of java.net.SocketImpl. It is the implementation for RDMA based socket and server socket. When jdk.net.openRdmaSocket/jdk.net.openRdmaServerSocket is invoked, a new instance of RdmaSocketImpl gets created. The newly created impl will be used to create a socket/server socket. RdmaSocketImpl has a static inner class RdmaSocketImpl.PlatformRdmaSocketImpl LinuxRdmaSocketImpl: a subclass of RdmaSocketImpl.PlatformRdmaSocketImpl for Linux OS RdmaSocketInputStream/RdmaSocketOutputStream: subclasses of java.io.FileInputStream/java.io.FileOutputStream, handling rsocket specific IO operations RdmaSocketOptions/RdmaSocketOptions.PlatformRdmaSocketOptions: In addition to the supported standard socket options, rsocket has three additional options: RDMA_SQSIZE, RDMA_RQSIZE and RDMA_INLINE . This class is created for set/get rsocket specific options. RdmaSocketOptions has an inner class for RdmaSocketOptions.PlatformRdmaSocketOptions LinuxRdmaSocketOptions: a subclass of RdmaSocketOptions.PlatformRdmaSocketOptions
The class diagrams are shown in Figure 1.
Public APIs proposed for RDMA based socket channels:
Module name: jdk.net; Package name: jdk.net jdk.net.Sockets.openRdmaSocketChannel(), return java.nio.channels.SocketChannel jdk.net.Sockets.openRdmaServerSocketChannel(), return java.nio.channels.ServerSocketChannel jdk.net.Sockets.openRdmaSelector(), return java.nio.channels.Selector
Module name: jdk.net; Package name: rdma.ch RdmaSocketChannelImpl: a subclass of java.nio.channels.SocketChannel that defines the implementations of RDMA channel operations such as connect, read and write RdmaServerSocketChannelImpl: a subclass of java.nio.channels.ServerSocketChannel that defines the implementations of RDMA server channel operations such as bind and accept RdmaSocketAdaptor: a subclass of java.net.Socket. It gets created from RdmaSocketChannelImpl to make an RDMA socket channel looks like an RDMA socket RdmaServerSocketAdaptor: a subclass of java.net.ServerSocket. It gets created from RdmaServerSocketChannelImpl to make an RDMA server socket channel looks like an RDMA server socket RdmaPollSelectorProvider: a subclass of sun.nio.ch.PollSelectorProvider for RDMA based socket channels. When jdk.net.Sockets.openRdmaSelector() is invoked, RdmaPollSelectorProvider.provider().openSelector() is called internally and a new instance of RdmaPollSelectorImpl is returned RdmaPollSelectorImpl: a subclass of sun.nio.ch.PollSelectorImpl. It is the implementation of RdmaPollSelectorProvider for RDMA based socket channels RdmaSocketDispatcher/RdmaSocketDispatcher.PlatformRdmaSocketDispatcher: RdmaSocketDispatcher is a subclass of sun.nio.ch.SocketDispatcher. It does majority of the RDMA based socket channel IO operations. It has a static inner class PlatformRdmaSocketDispatcher LinuxRdmaSocketDispatcher: a subclass of RdmaSocketDispatcher.PlatformRdmaSocketDispatcher for Linux OS RdmaNet: a subclass of sun.nio.ch.Net for RDMA based socket channel operations such as listen, bind and setSocketOption/getSocketOption
The class diagrams are shown in Figure 2.
Functional testing on both RDMA based sockets and RDMA based non-blocking socket channels.
CPU usage profiling with and without the feature to ensure CPU consumption is reduced, especially in kernel space.
Socket Direct Protocol (SDP)  is an alternative approach to enable RDMA for networking. It has been released with JDK1.7. However, SDP kernel support libsdp has been deprecated from Open Fabric Enterprise Edition (OFED) version 3.5 (February 2013) . rsocket was introduced in April 2012 to OFED as a successor to SDP. Specifically to Linux, rsocket support has been part of the kernel distribution too (no need to download and install from OFED).
Another alternative approach is to use LD_PRELOAD with librspreload library, which is part of librdmacm . When using this approach, all the system socket calls are intercepted with rsocket calls provided by the library. This does not provide the flexibility of having both regular socket operations and RDMA socket operations in the same application.
Risks and Assumptions
rsocket is currently only available on Linux. The assumption is the RDMA verbs transport library is pre-installed on the OS.
IPv4 and IPv6 incompatibility. Similar to SDP, rsocket does not work with IPv6-mapped-IPv4 addresses . -Djava.net.preferIPv4Stack=true is needed to run applications.
rsocket does not currently have support for EPoll equivalent capability. rpoll is used instead.