jtreg tests can be specified to run in agentvm mode. Internally, for agent vm mode, jtreg runs the tests' code in a JVM process that is different from the JVM process in which jtreg itself was launched. jtreg framework takes the responsibility of creating and managing the lifecycle of the JVM in which the test code is then executed. The JVM process which is created internally by jtreg framework, to run the test code, is called the "AgentServer" and the code which manages and interacts with the "AgentServer" from within the jtreg's JVM process is called the "Agent".
Communication between the AgentServer and the Agent (which are running in separate JVM processes) happens over sockets. The Agent code which is running within the jtreg process, when launching a AgentSever JVM, will create a java.net.ServerSocket instance and bind it to a (ephemeral) port. The port to which the ServerSocket is bound and listening on will be passed as a command line argument to the AgentServer JVM process which is being launched. The AgentServer JVM process, immediately after being launched is then expected to connect() to this port using java.net.Socket. The Agent code that runs in the jtreg process, meanwhile will be waiting for the connection attempt from the freshly launched JVM process. Once the connection is accept()ed by the Agent, the "handshake" completes and the necessary socket inputstream/outputstreams are constructed to further communicate with each other.
One important thing to note in this "handshake" step is that, the Agent code (running within the jtreg process) is currently implemented to only accept() one connection. Furthermore, there's no additional checks to verify if the connection that was accepted actually was from the AgentServer process which the Agent had launched. If it so happens that some other process, for whatever reason, ends up attempting a socket connection against the port which the Agent process is listening on, then the Agent process will accept that connection. There are at least two problems when that happens, one is that the subsequent commnunication will fail with the unexpected process that initiated the connection, because it won't understand the jtreg specific data that will be transferred on that socket connection and two - the AgentServer process which has issued a connect() will have no one accept()ing the connection request because the Agent code is implemented to accept only one connection.
This above scenario isn't just theoretical. We have had several failures in the past few months in our CI instances because the VMs on which we run jtreg have been installed with a process which has a bug which causes it to communicate over ports it shouldn't be communicating over. This then causes the Agent and AgentServer handshake to fail thus causing failures in the test case.
The Agent and AgentServer handshake process can be enhanced to be a bit more tolerant to such unexpected connection attempts from other processes. This enhancement request proposes to introduce an additional check after a connection is accept()ed by the Agent code, to check a few incoming bytes on that accepted connection, to be of specific value. The AgentServer is expected to send across specific handshake bytes (just few of them) when it connects, so that the Agent code can identify the incoming connection request to be originating from the AgentServer. Once verified, the handshake completes and the Agent and AgentServer will continue to communicate as usual. If the Agent code notices that the incoming connection's first few bytes of data don't represent the handshake data from a AgentServer process, then the enhancement proposes that the Agent will close that connection and additionally accept() another connection. This should thus allow the AgentServer's connect() attempt to succeed and the handshake bytes be verified by the Agent process. The Agent will be enhanced to accept() such failed connections only for a few number of times.
The goal of this enhancement is to prevent test failures and agent communication failures due to unexpected processes communicating on the port that the jtreg's Agent is listening on, during the handshake with the AgentServer.
Communication between the AgentServer and the Agent (which are running in separate JVM processes) happens over sockets. The Agent code which is running within the jtreg process, when launching a AgentSever JVM, will create a java.net.ServerSocket instance and bind it to a (ephemeral) port. The port to which the ServerSocket is bound and listening on will be passed as a command line argument to the AgentServer JVM process which is being launched. The AgentServer JVM process, immediately after being launched is then expected to connect() to this port using java.net.Socket. The Agent code that runs in the jtreg process, meanwhile will be waiting for the connection attempt from the freshly launched JVM process. Once the connection is accept()ed by the Agent, the "handshake" completes and the necessary socket inputstream/outputstreams are constructed to further communicate with each other.
One important thing to note in this "handshake" step is that, the Agent code (running within the jtreg process) is currently implemented to only accept() one connection. Furthermore, there's no additional checks to verify if the connection that was accepted actually was from the AgentServer process which the Agent had launched. If it so happens that some other process, for whatever reason, ends up attempting a socket connection against the port which the Agent process is listening on, then the Agent process will accept that connection. There are at least two problems when that happens, one is that the subsequent commnunication will fail with the unexpected process that initiated the connection, because it won't understand the jtreg specific data that will be transferred on that socket connection and two - the AgentServer process which has issued a connect() will have no one accept()ing the connection request because the Agent code is implemented to accept only one connection.
This above scenario isn't just theoretical. We have had several failures in the past few months in our CI instances because the VMs on which we run jtreg have been installed with a process which has a bug which causes it to communicate over ports it shouldn't be communicating over. This then causes the Agent and AgentServer handshake to fail thus causing failures in the test case.
The Agent and AgentServer handshake process can be enhanced to be a bit more tolerant to such unexpected connection attempts from other processes. This enhancement request proposes to introduce an additional check after a connection is accept()ed by the Agent code, to check a few incoming bytes on that accepted connection, to be of specific value. The AgentServer is expected to send across specific handshake bytes (just few of them) when it connects, so that the Agent code can identify the incoming connection request to be originating from the AgentServer. Once verified, the handshake completes and the Agent and AgentServer will continue to communicate as usual. If the Agent code notices that the incoming connection's first few bytes of data don't represent the handshake data from a AgentServer process, then the enhancement proposes that the Agent will close that connection and additionally accept() another connection. This should thus allow the AgentServer's connect() attempt to succeed and the handshake bytes be verified by the Agent process. The Agent will be enhanced to accept() such failed connections only for a few number of times.
The goal of this enhancement is to prevent test failures and agent communication failures due to unexpected processes communicating on the port that the jtreg's Agent is listening on, during the handshake with the AgentServer.
- links to
-
Review(master) openjdk/jtreg/195