Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8334482

Shenandoah: Deadlock when safepoint is pending during nmethods iteration

XMLWordPrintable

    • gc
    • b09

        In one of our applications running Shenandoah on Corretto 17.0.11+10, we see safepoint timeouts showing that the Sweeper thread has not reached a safepoint after 1000ms.

        ```
        # SafepointSynchronize::begin: Timeout detected:
        [364.175s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
        [364.175s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
        [364.175s][warning][safepoint] # "Sweeper thread" #19 daemon prio=9 os_prio=0 cpu=1516.68ms elapsed=363.34s tid=0x00007ff04c1bbe50 nid=0x7eda runnable [0x0000000000000000]
        [364.175s][warning][safepoint] java.lang.Thread.State: RUNNABLE
        [364.175s][warning][safepoint]
        [364.175s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
        ```

        ```
        Threads waiting in SuspendibleThreadSet:join for ShenandoahConcurrentWeakRootsEvacUpdate Task:

        #0 0x00007fa753404377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007fa7526f2a1b in os::PlatformMonitor::wait(long) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #2 0x00007fa7526a0489 in Monitor::wait_without_safepoint_check(long) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #3 0x00007fa7528c81fa in SuspendibleThreadSet::join() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #4 0x00007fa7527dc27d in ShenandoahConcurrentWeakRootsEvacUpdateTask::work(unsigned int) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #5 0x00007fa7529e42bf in GangWorker::loop() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #6 0x00007fa7529e431f in GangWorker::run() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #7 0x00007fa752930118 in Thread::call_run() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #8 0x00007fa7526e7131 in thread_native_entry(Thread*) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #9 0x00007fa7533fe44b in start_thread () from /lib64/libpthread.so.0
        #10 0x00007fa752f3552f in clone () from /lib64/libc.so.6

        and our blocked sweeper thread, waiting for the evac threads to notify it:

        #0 0x00007fa753404377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007fa7526f2a1b in os::PlatformMonitor::wait(long) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #2 0x00007fa7526a0489 in Monitor::wait_without_safepoint_check(long) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #3 0x00007fa75282169b in ShenandoahNMethodTable::flush_nmethod(nmethod*) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #4 0x00007fa7526ad05a in nmethod::flush() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #5 0x00007fa7528c8fe2 in NMethodSweeper::process_compiled_method(CompiledMethod*) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #6 0x00007fa7528c95a3 in NMethodSweeper::sweep_code_cache() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #7 0x00007fa7528c9eec in NMethodSweeper::sweep() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #8 0x00007fa7528ca126 in NMethodSweeper::sweeper_loop() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #9 0x00007fa75292c58b in JavaThread::thread_main_inner() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #10 0x00007fa752930118 in Thread::call_run() () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #11 0x00007fa7526e7131 in thread_native_entry(Thread*) () from /local/apollo/package/local_1/AL2_x86_64/JDK17/JDK17-3703.0-0/jdk-17/lib/server/libjvm.so
        #12 0x00007fa7533fe44b in start_thread () from /lib64/libpthread.so.0
        #13 0x00007fa752f3552f in clone () from /lib64/libc.so.6
        ```

        This appears to be a deadlock when another vm op happens at a bad time. See attached reproducer - run with `javac SweeperStuck.java && java -Xcomp -XX:+UseShenandoahGC -Xlog:safepoint=info -XX:+UnlockDiagnosticVMOptions -XX:+AbortVMOnSafepointTimeout -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=1000 SweeperStuck`

              shade Aleksey Shipilev
              ogillespie Oli Gillespie
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: