Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8263579

ZGC: Concurrent mark hangs with debug loglevel

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P4
    • 17
    • 16, 17
    • hotspot
    • gc
    • b15
    • generic
    • generic

    Description

      create this issue on hehalf Chao Wang (casparcwang@tencent.com).

      The jvm process will hang if using zgc with gc debug level log opened. The hang happened in our production environment, and can be reproduced by the following SPECjbb commands:
      java -XX:+UnlockExperimentalVMOptions -Xmx100g -XX:ZCollectionInterval=10 -XX:ParallelGCThreads=80 -XX:ConcGCThreads=70 -Xlog:gc*=debug:file=gc.log:time,level,tags:filesize=1g -XX:+UseZGC -jar ./specjbb2015.jar -m COMPOSITE

      Root cause of the hang is that the idle time is too short compared to the lock acquiring time of the debug logging, which make the gc threads never reach synchronized in zgc terminate stage 1. There are always some threads not in stage1 enter state and blocking on lock for logging.

      Following is the backtrace of gc workers when the hang happens:

      (gdb) bt
      #0 0x00007ffa2836ab3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
      #1 0x00007ffa2836abcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
      #2 0x00007ffa2836ac6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
      #3 0x00007ffa2751cd32 in PosixSemaphore::wait (this=this@entry=0x7ffa20001c60) at /data/openjdk/jdk_dev/src/hotspot/os/posix/semaphore_posix.cpp:64
      #4 0x00007ffa272ee083 in wait (this=0x7ffa20001c60) at /data/openjdk/jdk_dev/src/hotspot/share/runtime/semaphore.hpp:55
      #5 write (msg=0x7ffa04df07f0 "GC(0) Concurrent Mark Try Terminate (ZWorker#68) 1.471ms", decorations=..., this=0x7ffa20001b90) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:293
      #6 LogFileOutput::write (this=0x7ffa20001b90, decorations=..., msg=0x7ffa04df07f0 "GC(0) Concurrent Mark Try Terminate (ZWorker#68) 1.471ms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:287
      #7 0x00007ffa272f41f8 in log (msg=0x7ffa04df07f0 "GC(0) Concurrent Mark Try Terminate (ZWorker#68) 1.471ms", level=<optimized out>, this=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logOutputList.hpp:123
      #8 LogTagSet::vwrite (this=<optimized out>, level=<optimized out>, fmt=<optimized out>, args=0x7ffa04df0a48) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logTagSet.cpp:139
      #9 0x00007ffa26fcdfbb in LogTargetHandle::print (this=this@entry=0x7ffa04df0b40, fmt=fmt@entry=0x7ffa2786db05 "%s (%s) %.3fms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logHandle.hpp:95
      #10 0x00007ffa2778bc9c in ZStatPhase::log_end (this=this@entry=0x7ffa27d88860 <ZSubPhaseConcurrentMarkTryTerminate>, log=..., duration=..., thread=thread@entry=true) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:625
      #11 0x00007ffa2778c0a4 in ZStatSubPhase::register_end (this=0x7ffa27d88860 <ZSubPhaseConcurrentMarkTryTerminate>, start=..., end=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:738
      #12 0x00007ffa277600e4 in ~ZStatTimer (this=0x7ffa04df0c30, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:316
      #13 ZStatTimer::~ZStatTimer (this=0x7ffa04df0c30, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:313
      #14 0x00007ffa2776a036 in ZMark::try_terminate (this=this@entry=0x7ffa20036610) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMarkTerminate.inline.hpp:46
      #15 0x00007ffa2776a478 in ZMark::work_without_timeout (this=this@entry=0x7ffa20036610, cache=cache@entry=0x7ffa04df0dc0, stripe=stripe@entry=0x7ffa20036810, stacks=stacks@entry=0x7ffa200a2378) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:512
      #16 0x00007ffa2776b144 in ZMark::work (this=0x7ffa20036610, timeout_in_micros=0) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:583
      #17 0x00007ffa2779038c in ZTask::GangTask::work (this=0x7ff8bddfaaf8, worker_id=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zTask.cpp:34
      #18 0x00007ffa2774e75f in run_task (this=0x7ffa200a2350, data=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:280
      #19 GangWorker::loop (this=0x7ffa200a2350) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:290
      #20 0x00007ffa276a45e6 in Thread::call_run (this=0x7ffa200a2350) at /data/openjdk/jdk_dev/src/hotspot/share/runtime/thread.cpp:402
      #21 0x00007ffa2746c4d7 in thread_native_entry (thread=0x7ffa200a2350) at /data/openjdk/jdk_dev/src/hotspot/os/linux/os_linux.cpp:712
      #22 0x00007ffa28364ea5 in start_thread () from /lib64/libpthread.so.0
      #23 0x00007ffa27e898dd in clone () from /lib64/libc.so.6

      (gdb) bt
      #0 0x00007ffa27e7aa1d in write () from /lib64/libc.so.6
      #1 0x00007ffa27e05263 in _IO_new_file_write () from /lib64/libc.so.6
      #2 0x00007ffa27e06a7e in __GI__IO_do_write () from /lib64/libc.so.6
      #3 0x00007ffa27e048b8 in __GI__IO_file_sync () from /lib64/libc.so.6
      #4 0x00007ffa27df96b3 in fflush () from /lib64/libc.so.6
      #5 0x00007ffa272eed0e in flush (this=0x7ffa20001b90) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileStreamOutput.cpp:130
      #6 LogFileStreamOutput::write (this=this@entry=0x7ffa20001b90, decorations=..., msg=msg@entry=0x7ffa06f327f0 "GC(0) Concurrent Mark Idle (ZWorker#35) 1.066ms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileStreamOutput.cpp:130
      #7 0x00007ffa272ee091 in write (msg=0x7ffa06f327f0 "GC(0) Concurrent Mark Idle (ZWorker#35) 1.066ms", decorations=..., this=0x7ffa20001b90) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:294
      #8 LogFileOutput::write (this=0x7ffa20001b90, decorations=..., msg=0x7ffa06f327f0 "GC(0) Concurrent Mark Idle (ZWorker#35) 1.066ms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:287
      #9 0x00007ffa272f41f8 in log (msg=0x7ffa06f327f0 "GC(0) Concurrent Mark Idle (ZWorker#35) 1.066ms", level=<optimized out>, this=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logOutputList.hpp:123
      #10 LogTagSet::vwrite (this=<optimized out>, level=<optimized out>, fmt=<optimized out>, args=0x7ffa06f32a48) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logTagSet.cpp:139
      #11 0x00007ffa26fcdfbb in LogTargetHandle::print (this=this@entry=0x7ffa06f32b40, fmt=fmt@entry=0x7ffa2786db05 "%s (%s) %.3fms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logHandle.hpp:95
      #12 0x00007ffa2778bc9c in ZStatPhase::log_end (this=this@entry=0x7ffa27d888a0 <ZSubPhaseConcurrentMarkIdle>, log=..., duration=..., thread=thread@entry=true) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:625
      #13 0x00007ffa2778c0a4 in ZStatSubPhase::register_end (this=0x7ffa27d888a0 <ZSubPhaseConcurrentMarkIdle>, start=..., end=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:738
      #14 0x00007ffa277600e4 in ~ZStatTimer (this=0x7ffa06f32c70, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:316
      #15 ZStatTimer::~ZStatTimer (this=0x7ffa06f32c70, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:313
      #16 0x00007ffa2776a001 in idle (this=0x7ffa20036610) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:382
      #17 ZMark::try_terminate (this=this@entry=0x7ffa20036610) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:474
      #18 0x00007ffa2776a478 in ZMark::work_without_timeout (this=this@entry=0x7ffa20036610, cache=cache@entry=0x7ffa06f32dc0, stripe=stripe@entry=0x7ffa20036b10, stacks=stacks@entry=0x7ffa200802f8) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:512
      #19 0x00007ffa2776b144 in ZMark::work (this=0x7ffa20036610, timeout_in_micros=0) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:583
      #20 0x00007ffa2779038c in ZTask::GangTask::work (this=0x7ff8bddfaaf8, worker_id=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zTask.cpp:34
      #21 0x00007ffa2774e75f in run_task (this=0x7ffa200802d0, data=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:280
      #22 GangWorker::loop (this=0x7ffa200802d0) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:290
      #23 0x00007ffa276a45e6 in Thread::call_run (this=0x7ffa200802d0) at /data/openjdk/jdk_dev/src/hotspot/share/runtime/thread.cpp:402
      #24 0x00007ffa2746c4d7 in thread_native_entry (thread=0x7ffa200802d0) at /data/openjdk/jdk_dev/src/hotspot/os/linux/os_linux.cpp:712
      #25 0x00007ffa28364ea5 in start_thread () from /lib64/libpthread.so.0
      #26 0x00007ffa27e898dd in clone () from /lib64/libc.so.6

      (gdb) bt
      #0 0x00007ffa2836ab3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
      #1 0x00007ffa2836abcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
      #2 0x00007ffa2836ac6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
      #3 0x00007ffa2751cd32 in PosixSemaphore::wait (this=this@entry=0x7ffa20001c60) at /data/openjdk/jdk_dev/src/hotspot/os/posix/semaphore_posix.cpp:64
      #4 0x00007ffa272ee083 in wait (this=0x7ffa20001c60) at /data/openjdk/jdk_dev/src/hotspot/share/runtime/semaphore.hpp:55
      #5 write (msg=0x7ffa243ff7f0 "GC(0) Concurrent Mark Idle (ZWorker#20) 1.081ms", decorations=..., this=0x7ffa20001b90) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:293
      #6 LogFileOutput::write (this=0x7ffa20001b90, decorations=..., msg=0x7ffa243ff7f0 "GC(0) Concurrent Mark Idle (ZWorker#20) 1.081ms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logFileOutput.cpp:287
      #7 0x00007ffa272f41f8 in log (msg=0x7ffa243ff7f0 "GC(0) Concurrent Mark Idle (ZWorker#20) 1.081ms", level=<optimized out>, this=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logOutputList.hpp:123
      #8 LogTagSet::vwrite (this=<optimized out>, level=<optimized out>, fmt=<optimized out>, args=0x7ffa243ffa48) at /data/openjdk/jdk_dev/src/hotspot/share/logging/logTagSet.cpp:139
      #9 0x00007ffa26fcdfbb in LogTargetHandle::print (this=this@entry=0x7ffa243ffb40, fmt=fmt@entry=0x7ffa2786db05 "%s (%s) %.3fms") at /data/openjdk/jdk_dev/src/hotspot/share/logging/logHandle.hpp:95
      #10 0x00007ffa2778bc9c in ZStatPhase::log_end (this=this@entry=0x7ffa27d888a0 <ZSubPhaseConcurrentMarkIdle>, log=..., duration=..., thread=thread@entry=true) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:625
      #11 0x00007ffa2778c0a4 in ZStatSubPhase::register_end (this=0x7ffa27d888a0 <ZSubPhaseConcurrentMarkIdle>, start=..., end=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.cpp:738
      #12 0x00007ffa277600e4 in ~ZStatTimer (this=0x7ffa243ffc70, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:316
      #13 ZStatTimer::~ZStatTimer (this=0x7ffa243ffc70, __in_chrg=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zStat.hpp:313
      #14 0x00007ffa2776a001 in idle (this=0x7ffa20036610) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:382
      #15 ZMark::try_terminate (this=this@entry=0x7ffa20036610) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:474
      #16 0x00007ffa2776a478 in ZMark::work_without_timeout (this=this@entry=0x7ffa20036610, cache=cache@entry=0x7ffa243ffdc0, stripe=stripe@entry=0x7ffa20036e90, stacks=stacks@entry=0x7ffa20070b78) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:512
      #17 0x00007ffa2776b144 in ZMark::work (this=0x7ffa20036610, timeout_in_micros=0) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zMark.cpp:583
      #18 0x00007ffa2779038c in ZTask::GangTask::work (this=0x7ff8bddfaaf8, worker_id=<optimized out>) at /data/openjdk/jdk_dev/src/hotspot/share/gc/z/zTask.cpp:34
      #19 0x00007ffa2774e75f in run_task (this=0x7ffa20070b50, data=...) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:280
      #20 GangWorker::loop (this=0x7ffa20070b50) at /data/openjdk/jdk_dev/src/hotspot/share/gc/shared/workgroup.cpp:290
      #21 0x00007ffa276a45e6 in Thread::call_run (this=0x7ffa20070b50) at /data/openjdk/jdk_dev/src/hotspot/share/runtime/thread.cpp:402
      #22 0x00007ffa2746c4d7 in thread_native_entry (thread=0x7ffa20070b50) at /data/openjdk/jdk_dev/src/hotspot/os/linux/os_linux.cpp:712
      #23 0x00007ffa28364ea5 in start_thread () from /lib64/libpthread.so.0
      #24 0x00007ffa27e898dd in clone () from /lib64/libc.so.6

      Attachments

        Issue Links

          Activity

            People

              hshi Hui Shi
              hshi Hui Shi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: