Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8325202

gc/g1/TestMarkStackOverflow.java intermittently crash: G1CMMarkStack::ChunkAllocator::allocate_new_chunk

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P2
    • 23
    • 23
    • hotspot
    • gc
    • b12
    • x86_64
    • linux

    Description

      test command:

      export test=test/hotspot/jtreg/gc/g1/TestMarkStackOverflow.java
      function runJtreg() { jtreg -ea -esa -timeoutFactor:4 -v:fail,error,time,nopass -nr -w $dir/index-$1 $test &> $dir/$1.log ; if [[ 0 -ne $? ]] ; then echo -n "$1 " ; else rm
      -rf $dir/index-$1 $dir/$1.log ; fi ; } ; export -f runJtreg ; export dir="tmp-jtreg-"`basename ${test##* } .java | sed "s|#|_|"` ; rm -rf $dir ; mkdir -p $dir ; time seq 100000 | xargs -i -n 1 -P `npr
      oc` bash -c "runJtreg {}" ; echo total fail number: `ls $dir/*.log 2> /dev/null | wc | awk '{print $1}'`


      result:
      command: main -XX:ActiveProcessorCount=2 -XX:MarkStackSize=1 -Xmx250m gc.g1.TestMarkStackOverflow
      reason: User specified action: run main/othervm -XX:ActiveProcessorCount=2 -XX:MarkStackSize=1 -Xmx250m gc.g1.TestMarkStackOverflow
      started: Sun Feb 04 10:25:52 CST 2024
      Mode: othervm [/othervm specified]
      finished: Sun Feb 04 10:25:56 CST 2024
      elapsed time (seconds): 3.987
      configuration:
      STDOUT:
      Used mem 18.47 MB
      Used mem 36.23 MB
      Used mem 53.43 MB
      Used mem 70.63 MB
      Used mem 87.83 MB
      Used mem 105.03 MB
      Used mem 122.87 MB
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      # SIGSEGV (0xb) at pc=0x00007f161b429e51, pid=340136, tid=340141
      #
      # JRE version: OpenJDK Runtime Environment (23.0) (build 23)
      # Java VM: OpenJDK 64-Bit Server VM (23, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
      # Problematic frame:
      # V [libjvm.so+0x7e9e51] G1CMMarkStack::ChunkAllocator::allocate_new_chunk()+0xa1
      #
      # Core dump will be written. Default location: /var/tmp/tone/run/jtreg/jt-work/hotspot_jtreg/gc/g1/TestMarkStackOverflow/core.340136
      #
      # An error report file with more information is saved as:
      # /var/tmp/tone/run/jtreg/jt-work/hotspot_jtreg/gc/g1/TestMarkStackOverflow/hs_err_pid340136.log
      #
      # If you would like to submit a bug report, please visit:
      # mailto:yansendao.ysd@alibaba-inc.com
      #


      Recurrence probability: 1/100k

      Failure Mode:
      Assume current stack capacity is 1:

      1: Thread 1: Obtains cur_idx as 1 and notices that the associated bucket is not allocated, indicating insufficient capacity. It then attempts to double the capacity to 2.
      2: Thread 2: Obtains cur_idx as 2 and finds that the bucket associated with index 2 is also not allocated. Consequently, Thread 2 also tries to expand the stack.
      3: Due to a delay in Thread 1's execution, Thread 2 acquires the lock first and initiates the expansion by calling the expand() function. This function doubles the capacity from 1 to 2.
      4: However, upon returning from the expansion, the bucket associated with cur_idx 2 remains unallocated. Consequently, when Thread 2 tries to access this bucket, it crashes.

      The problem is that the expand() function is called without considering the specific thread context that necessitated the expansion. Instead, it expands the capacity based on the current size of the stack, leading to conflicts when multiple threads concurrently attempt to expand the stack's capacity.


      Attachments

        1. 33065.log
          7 kB
        2. hs_err_pid110104.log
          98 kB
        3. hs_err_pid340136.log
          99 kB

        Issue Links

          Activity

            People

              iwalulya Ivan Walulya
              syan Sendao Yan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: