Enhance the JDK networking API to support remote direct memory access (RDMA) using the rsocket protocol on Linux-based platforms.
For HPC and cloud applications, fully utilizing networking hardware capabilities to reach maximum bandwidth at low latency is challenging. Networking libraries inside the JDK currently use operating-system kernel sockets. Multiple memory copies between user and kernel spaces are involved during data transfers, which are expensive in both memory bandwith and CPU cycles. To improve this, we propose to add support for rsocket, a protocol over remote direct memory access (RDMA) that permits high-throughput, low-latency networking.
In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters. RDMA supports zero-copy networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer. — Wikipedia
rsocket is a protocol over RDMA that supports a C API for applications. The API intended to match the familiar socket API. For example, to create a socket and return a file descriptor,
default socket call: int socket(int domain, int type, int protocol);
rsocket function: int rsocket(int domain, int type, int protocol);
The JDK networking libraries use the standard socket API, so the rsocket API is a good fit for enabling RDMA on both traditional sockets and non-blocking socket channels.
Proposed JDK-specific API
The rsocket feature is non-standard and is only available on Linux platforms. Therefore, the proposal here is to add a Java API to the
jdk.net package of the
jdk.net module, which is where other JDK-specific networking features are already exposed. On platforms where rsocket is not supported, an
UnsupportedOperationException will be thrown if the APIs are used.
A new class named jdk.net.RdmaSockets will define factory methods to create RDMA-based TCP sockets and channels.
java.net.Socket openSocket(ProtocolFamily family)
java.net.ServerSocket openServerSocket(ProtocolFamily family)
java.nio.channels.SocketChannel openSocketChannel(ProtocolFamily family)
java.nio.channels.ServerSocketChannel openServerSocketChannel(ProtocolFamily family)
All tests will require RDMA-capable hardware (network interface adapter and switch).
Functional testing will be done on both RDMA based sockets and RDMA based non-blocking socket channels.
CPU usage profiling with and without the feature will be done to ensure that CPU consumption is reduced, especially in kernel space.
The Socket Direct Protocol (SDP) is an alternative approach to enable RDMA for networking. Support for it was included in JDK 7. However, SDP kernel support has been deprecated in the Open Fabric Enterprise Edition (OFED) version 3.5 (February 2013) specification. rsocket was introduced in April 2012 to OFED as a successor to SDP.
Another alternative approach is to use LD_PRELOAD with the
librspreloadlibrary, which is part of the
librdmacmRDMA library on Linux. When using this approach, all the system socket calls are translated to rsocket calls provided by the library. This does not provide the flexibility of having both regular socket operations and RDMA socket operations in the same application.
Risks and Assumptions
rsocket is currently only available on Linux. We assume that the RDMA verbs transport library will be installed on the OS.
As with SDP, rsocket does not work with IPv6-mapped-IPv4 addresses. All RDMA based sockets and socket channels will use the IPv4 protocol for connect/accept.
rsocket does not currently have support for
epollor an equivalent capability, but
rpollis used instead. rsocket based
SocketChannels cannot be multiplexed with other selectable channels.