When running on slow machines with a large number of CPUs with jtreg4.2 b09 using agentvm mode and -Xcomp, we get frequent socket errors. For example, the following is an extract from the jtreg console log during an internal test run:
[2017-10-18 02:33:37,303] Agent[6]: stderr: java.net.ConnectException: Connection refused (Connection refused)
[2017-10-18 02:33:37,721] Agent[6]: stderr: at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2017-10-18 02:33:38,478] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:400)
[2017-10-18 02:33:38,479] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:243)
[2017-10-18 02:33:38,480] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:225)
[2017-10-18 02:33:38,480] Agent[6]: stderr: at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:402)
[2017-10-18 02:33:38,481] Agent[6]: stderr: at java.base/java.net.Socket.connect(Socket.java:591)
[2017-10-18 02:33:38,482] Agent[6]: stderr: at java.base/java.net.Socket.connect(Socket.java:540)
[2017-10-18 02:33:38,483] Agent[6]: stderr: at java.base/java.net.Socket.<init>(Socket.java:436)
[2017-10-18 02:33:38,483] Agent[6]: stderr: at java.base/java.net.Socket.<init>(Socket.java:246)
[2017-10-18 02:33:38,484] Agent[6]: stderr: at com.sun.javatest.regtest.agent.AgentServer.<init>(AgentServer.java:154)
[2017-10-18 02:33:38,485] Agent[6]: stderr: at com.sun.javatest.regtest.agent.AgentServer.main(AgentServer.java:60)
In this particular run, we pass the following arguments to JTREG:
-concurrency:6 ... -agentvm ... -timeoutFactor:10 -J-Xmx512m ... \
-javaoptions:-Xcomp ... -javaoptions:-XX:CompileThreshold=100 ... \
-javaoptions:-server -javaoptions:-XX:-TieredCompilation ... \
Note the large timeoutFactor is chosen because the tests are expected to run very slowly when -Xcomp is enabled.
The problem is when jtreg spawns agent JVM processes, it uses a fixed 60 second timeout when calling ServerSocket.accept(). See http://hg.openjdk.java.net/code-tools/jtreg/file/684f12eef4a8/src/share/classes/com/sun/javatest/regtest/exec/Agent.java#l124
final int ACCEPT_TIMEOUT = 60*1000; // 1 minute, for server to start and "phone home"
ss.setSoTimeout(ACCEPT_TIMEOUT);
Socket s = ss.accept();
Due to -Xcomp, the agent JVM runs very slowly (printing a single stack trace for Agent[6] takes over 1 second!). So it's very likely that it would take over 60 seconds before executing the following code to connect to the above ServerSocket:
http://hg.openjdk.java.net/code-tools/jtreg/file/684f12eef4a8/src/share/classes/com/sun/javatest/regtest/agent/AgentServer.java#l154
if (port > 0) {
154: Socket s = new Socket(host, port);
The suggested fix is to increase ACCEPT_TIMEOUT w.r.t. timeoutFactor.
[2017-10-18 02:33:37,303] Agent[6]: stderr: java.net.ConnectException: Connection refused (Connection refused)
[2017-10-18 02:33:37,721] Agent[6]: stderr: at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2017-10-18 02:33:38,478] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:400)
[2017-10-18 02:33:38,479] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:243)
[2017-10-18 02:33:38,480] Agent[6]: stderr: at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:225)
[2017-10-18 02:33:38,480] Agent[6]: stderr: at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:402)
[2017-10-18 02:33:38,481] Agent[6]: stderr: at java.base/java.net.Socket.connect(Socket.java:591)
[2017-10-18 02:33:38,482] Agent[6]: stderr: at java.base/java.net.Socket.connect(Socket.java:540)
[2017-10-18 02:33:38,483] Agent[6]: stderr: at java.base/java.net.Socket.<init>(Socket.java:436)
[2017-10-18 02:33:38,483] Agent[6]: stderr: at java.base/java.net.Socket.<init>(Socket.java:246)
[2017-10-18 02:33:38,484] Agent[6]: stderr: at com.sun.javatest.regtest.agent.AgentServer.<init>(AgentServer.java:154)
[2017-10-18 02:33:38,485] Agent[6]: stderr: at com.sun.javatest.regtest.agent.AgentServer.main(AgentServer.java:60)
In this particular run, we pass the following arguments to JTREG:
-concurrency:6 ... -agentvm ... -timeoutFactor:10 -J-Xmx512m ... \
-javaoptions:-Xcomp ... -javaoptions:-XX:CompileThreshold=100 ... \
-javaoptions:-server -javaoptions:-XX:-TieredCompilation ... \
Note the large timeoutFactor is chosen because the tests are expected to run very slowly when -Xcomp is enabled.
The problem is when jtreg spawns agent JVM processes, it uses a fixed 60 second timeout when calling ServerSocket.accept(). See http://hg.openjdk.java.net/code-tools/jtreg/file/684f12eef4a8/src/share/classes/com/sun/javatest/regtest/exec/Agent.java#l124
final int ACCEPT_TIMEOUT = 60*1000; // 1 minute, for server to start and "phone home"
ss.setSoTimeout(ACCEPT_TIMEOUT);
Socket s = ss.accept();
Due to -Xcomp, the agent JVM runs very slowly (printing a single stack trace for Agent[6] takes over 1 second!). So it's very likely that it would take over 60 seconds before executing the following code to connect to the above ServerSocket:
http://hg.openjdk.java.net/code-tools/jtreg/file/684f12eef4a8/src/share/classes/com/sun/javatest/regtest/agent/AgentServer.java#l154
if (port > 0) {
154: Socket s = new Socket(host, port);
The suggested fix is to increase ACCEPT_TIMEOUT w.r.t. timeoutFactor.