> From: Waldek Kozaczuk <jwkozaczuk@gmail.com>
> Date: August 7, 2019 at 21:16:36 EDT
> To: hotspot-dev@openjdk.java.net
> Subject: Jetty hangs early on Java 12 on OSv with Threadlocal Handshakes ON
>
> Hi,
>
> I am one of the committers to the OSv project (http://osv.io/). OSv is is a Linux-compatible unikernel. OSv has been able to run unmodified Linux version of Java very well since early on including Java 7 all the way to Java 11. However, I have discovered some problems with the latest version of Java 12.
>
> I essence when I try to start a demo version of Jetty app on Java 12 (specifically jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64) on OSv it hangs after printing a single log message (once I saw it print 3-4 lines and hang after) which looks like this:
>
> OSv v0.53.0-73-gd7054e0b
>
> eth0: 192.168.122.15
>
> Booted up in 135.69 ms
>
> fpathconf() stubbed
>
> 2019-08-07 19:04:13.758:INFO::main: Logging initialized @797ms
>
> Please note the last line from Jetty. When I run the exact same app on Java 11 (jdk-zulu11.33.15-ca-jdk11.0.4-linux_x64) on OSv, it starts just fine:
>
> OSv v0.53.0-73-gd7054e0b
>
> eth0: 192.168.122.15
>
> Booted up in 138.25 ms
>
> fpathconf() stubbed
>
> 2019-08-07 22:28:50.284:INFO::main: Logging initialized @726ms
>
> 2019-08-07 22:28:50.536:WARN::main: demo test-realm is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:50.539:INFO:oejs.Server:main: jetty-9.2.11.v20150529
>
> 2019-08-07 22:28:50.552:INFO:oejdp.ScanningAppProvider:main: Deployment monitor [file:/jetty/demo-base/webapps/] at interval 1
>
> 2019-08-07 22:28:50.879:WARN::main: test webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.279:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@1ffaf86{/test,file:/tmp/jetty-0.0.0.0-8080-test.war-_test-any-5466198194536157666.dir/webapp/,AVAILABLE}{/test.war}
>
> 2019-08-07 22:28:51.392:WARN::main: test-spec webapp is deployed. DO NOT USE IN PRODUCTION!
>
> Calling TestListener.contextInitialized
>
> 2019-08-07 22:28:51.466:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@4bef0fe3{/test-spec,[file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/, jar:file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/WEB-INF/lib/test-web-fragment-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/test-spec.war}
>
> 2019-08-07 22:28:51.480:INFO:oejsh.ContextHandler:main: Started o.e.j.s.h.MovedContextHandler@18245eb0{/oldContextPath,null,AVAILABLE}
>
> 2019-08-07 22:28:51.517:WARN::main: test-jaas webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.563:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@6d0b5baf{/test-jaas,file:/tmp/jetty-0.0.0.0-8080-test-jaas.war-_test-jaas-any-4900956465487745831.dir/webapp/,AVAILABLE}{/test-jaas.war}
>
> 2019-08-07 22:28:51.618:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@36b0fcd5{/,file:/jetty/demo-base/webapps/ROOT/,AVAILABLE}{/ROOT}
>
> 2019-08-07 22:28:51.678:WARN::main: test-jndi webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.712:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@89ff02e{/test-jndi,file:/tmp/jetty-0.0.0.0-8080-test-jndi.war-_test-jndi-any-5186649427965161663.dir/webapp/,AVAILABLE}{/test-jndi.war}
>
> 2019-08-07 22:28:51.853:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@12cd9150{/proxy,file:/tmp/jetty-0.0.0.0-8080-xref-proxy.war-_xref-proxy-any-18424004387378155009.dir/webapp/,AVAILABLE}{/xref-proxy.war}
>
> 2019-08-07 22:28:51.953:WARN::main: async-rest webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.970:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@473b3b7a{/async-rest,[file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/, jar:file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/WEB-INF/lib/example-async-rest-jar-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/async-rest.war}
>
> 2019-08-07 22:28:51.984:INFO:oejs.ServerConnector:main: Started ServerConnector@2b1a1fd{HTTP/1.1}{0.0.0.0:8080}
>
> 2019-08-07 22:28:51.991:INFO:oejs.ServerConnector:main: Started ServerConnector@ad47a4a{SSL-http/1.1}{0.0.0.0:8443}
>
> 2019-08-07 22:28:51.993:INFO:oejs.Server:main: Started @2434ms
>
> After connecting with gdb I have discovered that one of the JVM threads hangs in a loop while actively calling sem_trywait() - see full thread-dump for all Java threads in the bottom of the email):
>
> thread 0xffff800004691040 VM Thread
>
> #0 0x0000100000c86370 in sem_trywait@plt ()
> #1 0x0000100001702bf2 in PosixSemaphore::trywait() ()
> #2 0x000010000121aab4 in VM_HandshakeAllThreads::doit() ()
> #3 0x000010000198a8d7 in VM_Operation::evaluate() ()
> #4 0x0000100001990706 in VMThread::evaluate_operation(VM_Operation*) [clone .constprop.64] ()
> #5 0x0000100001990e7a in VMThread::loop() ()
> #6 0x0000100001991073 in VMThread::run() ()
> #7 0x00001000019186ff in Thread::call_run() ()
> #8 0x000010000162e1be in thread_native_entry(Thread*) ()
> #9 0x000000004045eeb6 in pthread_private::pthread::<lambda()>::operator() (__closure=0xffffa00003d6d400) at libc/pthread.cc:114
> #10 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
> at /usr/include/c++/8/bits/std_function.h:297
> #11 0x0000000040401117 in sched::thread_main_c (t=0xffff800004691040) at arch/x64/arch-switch.hh:271
> #12 0x00000000403a71d3 in thread_main () at arch/x64/entry.S:113
>
> (note that frames 9 and above are from OSv as the libc calls are handled as local function calls on OSv).
>
> After more research around VM_HandshakeAllThreads (in https://github.com/openjdk/jdk/blob/0fe0312f416add1536a45ecfb292c887ef7e02bd/src/hotspot/share/runtime/handshake.cpp#L152-L208) I have stumbled across this JEP https://openjdk.java.net/jeps/312 - Thread-Local Handshakes) where I realized new cross-threads callback mechanism has been introduced starting in Java 10 to replace "stop-the-world" mechanism (as I understand). I have also found a switch to disable it using the -XX:-ThreadLocalHandshakes switch.
>
> So now when I run Jetty app on Java 12 with such command line and disbale TLH:
>
> /usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs -XX:-ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
> it works and does not hang any more:
>
> when this one hangs (TLH explicitly enabled):
>
> /usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs -XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
>
> When I explicitly enable thread local handshakes on Java 11 it works as well:
>
> /usr/lib/jvm/jdk-zulu11.33.15-ca-jdk11.0.4-linux_x64/bin/java -Xrs -XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
>
> Note I am limiting signals usage by JVM in all cases just to narrow down my research.
>
> One more datapoint - I am able to run a simple Hello World java app on Java 12 on OSv with thread-local handshakes on so it looks like this problem appears on more complicated apps.
>
> Lastly, the exact same app works fine on Java 12 on Linux.
>
> Date: August 7, 2019 at 21:16:36 EDT
> To: hotspot-dev@openjdk.java.net
> Subject: Jetty hangs early on Java 12 on OSv with Threadlocal Handshakes ON
>
> Hi,
>
> I am one of the committers to the OSv project (http://osv.io/). OSv is is a Linux-compatible unikernel. OSv has been able to run unmodified Linux version of Java very well since early on including Java 7 all the way to Java 11. However, I have discovered some problems with the latest version of Java 12.
>
> I essence when I try to start a demo version of Jetty app on Java 12 (specifically jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64) on OSv it hangs after printing a single log message (once I saw it print 3-4 lines and hang after) which looks like this:
>
> OSv v0.53.0-73-gd7054e0b
>
> eth0: 192.168.122.15
>
> Booted up in 135.69 ms
>
> fpathconf() stubbed
>
> 2019-08-07 19:04:13.758:INFO::main: Logging initialized @797ms
>
> Please note the last line from Jetty. When I run the exact same app on Java 11 (jdk-zulu11.33.15-ca-jdk11.0.4-linux_x64) on OSv, it starts just fine:
>
> OSv v0.53.0-73-gd7054e0b
>
> eth0: 192.168.122.15
>
> Booted up in 138.25 ms
>
> fpathconf() stubbed
>
> 2019-08-07 22:28:50.284:INFO::main: Logging initialized @726ms
>
> 2019-08-07 22:28:50.536:WARN::main: demo test-realm is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:50.539:INFO:oejs.Server:main: jetty-9.2.11.v20150529
>
> 2019-08-07 22:28:50.552:INFO:oejdp.ScanningAppProvider:main: Deployment monitor [file:/jetty/demo-base/webapps/] at interval 1
>
> 2019-08-07 22:28:50.879:WARN::main: test webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.279:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@1ffaf86{/test,file:/tmp/jetty-0.0.0.0-8080-test.war-_test-any-5466198194536157666.dir/webapp/,AVAILABLE}{/test.war}
>
> 2019-08-07 22:28:51.392:WARN::main: test-spec webapp is deployed. DO NOT USE IN PRODUCTION!
>
> Calling TestListener.contextInitialized
>
> 2019-08-07 22:28:51.466:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@4bef0fe3{/test-spec,[file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/, jar:file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/WEB-INF/lib/test-web-fragment-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/test-spec.war}
>
> 2019-08-07 22:28:51.480:INFO:oejsh.ContextHandler:main: Started o.e.j.s.h.MovedContextHandler@18245eb0{/oldContextPath,null,AVAILABLE}
>
> 2019-08-07 22:28:51.517:WARN::main: test-jaas webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.563:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@6d0b5baf{/test-jaas,file:/tmp/jetty-0.0.0.0-8080-test-jaas.war-_test-jaas-any-4900956465487745831.dir/webapp/,AVAILABLE}{/test-jaas.war}
>
> 2019-08-07 22:28:51.618:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@36b0fcd5{/,file:/jetty/demo-base/webapps/ROOT/,AVAILABLE}{/ROOT}
>
> 2019-08-07 22:28:51.678:WARN::main: test-jndi webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.712:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@89ff02e{/test-jndi,file:/tmp/jetty-0.0.0.0-8080-test-jndi.war-_test-jndi-any-5186649427965161663.dir/webapp/,AVAILABLE}{/test-jndi.war}
>
> 2019-08-07 22:28:51.853:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@12cd9150{/proxy,file:/tmp/jetty-0.0.0.0-8080-xref-proxy.war-_xref-proxy-any-18424004387378155009.dir/webapp/,AVAILABLE}{/xref-proxy.war}
>
> 2019-08-07 22:28:51.953:WARN::main: async-rest webapp is deployed. DO NOT USE IN PRODUCTION!
>
> 2019-08-07 22:28:51.970:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@473b3b7a{/async-rest,[file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/, jar:file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/WEB-INF/lib/example-async-rest-jar-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/async-rest.war}
>
> 2019-08-07 22:28:51.984:INFO:oejs.ServerConnector:main: Started ServerConnector@2b1a1fd{HTTP/1.1}{0.0.0.0:8080}
>
> 2019-08-07 22:28:51.991:INFO:oejs.ServerConnector:main: Started ServerConnector@ad47a4a{SSL-http/1.1}{0.0.0.0:8443}
>
> 2019-08-07 22:28:51.993:INFO:oejs.Server:main: Started @2434ms
>
> After connecting with gdb I have discovered that one of the JVM threads hangs in a loop while actively calling sem_trywait() - see full thread-dump for all Java threads in the bottom of the email):
>
> thread 0xffff800004691040 VM Thread
>
> #0 0x0000100000c86370 in sem_trywait@plt ()
> #1 0x0000100001702bf2 in PosixSemaphore::trywait() ()
> #2 0x000010000121aab4 in VM_HandshakeAllThreads::doit() ()
> #3 0x000010000198a8d7 in VM_Operation::evaluate() ()
> #4 0x0000100001990706 in VMThread::evaluate_operation(VM_Operation*) [clone .constprop.64] ()
> #5 0x0000100001990e7a in VMThread::loop() ()
> #6 0x0000100001991073 in VMThread::run() ()
> #7 0x00001000019186ff in Thread::call_run() ()
> #8 0x000010000162e1be in thread_native_entry(Thread*) ()
> #9 0x000000004045eeb6 in pthread_private::pthread::<lambda()>::operator() (__closure=0xffffa00003d6d400) at libc/pthread.cc:114
> #10 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
> at /usr/include/c++/8/bits/std_function.h:297
> #11 0x0000000040401117 in sched::thread_main_c (t=0xffff800004691040) at arch/x64/arch-switch.hh:271
> #12 0x00000000403a71d3 in thread_main () at arch/x64/entry.S:113
>
> (note that frames 9 and above are from OSv as the libc calls are handled as local function calls on OSv).
>
> After more research around VM_HandshakeAllThreads (in https://github.com/openjdk/jdk/blob/0fe0312f416add1536a45ecfb292c887ef7e02bd/src/hotspot/share/runtime/handshake.cpp#L152-L208) I have stumbled across this JEP https://openjdk.java.net/jeps/312 - Thread-Local Handshakes) where I realized new cross-threads callback mechanism has been introduced starting in Java 10 to replace "stop-the-world" mechanism (as I understand). I have also found a switch to disable it using the -XX:-ThreadLocalHandshakes switch.
>
> So now when I run Jetty app on Java 12 with such command line and disbale TLH:
>
> /usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs -XX:-ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
> it works and does not hang any more:
>
> when this one hangs (TLH explicitly enabled):
>
> /usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs -XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
>
> When I explicitly enable thread local handshakes on Java 11 it works as well:
>
> /usr/lib/jvm/jdk-zulu11.33.15-ca-jdk11.0.4-linux_x64/bin/java -Xrs -XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar /jetty/start.jar
>
> Note I am limiting signals usage by JVM in all cases just to narrow down my research.
>
> One more datapoint - I am able to run a simple Hello World java app on Java 12 on OSv with thread-local handshakes on so it looks like this problem appears on more complicated apps.
>
> Lastly, the exact same app works fine on Java 12 on Linux.
>