Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8196485

FromCardCache default card index can cause crashes

XMLWordPrintable

    • gc
    • b10
    • x86_64
    • generic

      FULL PRODUCT VERSION :
      JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
      Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 )

      FULL OS VERSION :
      Linux foo 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux


      A DESCRIPTION OF THE PROBLEM :
      I am creating a new bug report as I cannot update the old one (https://bugs.java.com/view_bug.do?bug_id=8172756) with new information.

      The current theory is that the crashes stem from card cache returning invalid results because of the overflow here: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/69aec2ca5d90/src/share/vm/gc_implementation/g1/heapRegionRemSet.cpp#l436

      This can at least give from_card of -1 which is also used to mean that cache is empty (bonus: t also seems heaps of size > 2^40 should overlap in that cache.)

      In the case of overflow to -1, we can then end up with a situation where some region end up being reclaimed even when there are live pointers to objects in it. When those pointers are then later on followed in a marking phase, the pointers points into a (middle of) some new object and calling oopDesc::size() on that "object" will follow some trash pointers and cause crash.


      THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

      THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      We've been unable to reproduce the crash with a simple program, but we're able to trigger the assert

          assert(contains_reference(from), err_msg("We just found " PTR_FORMAT " in the FromCardCache", from));

      with this small program, fastdebug JVM and specific command line parameters (listed in the code comments) https://gist.github.com/jmiettinen/62506fd75855edb346d88cc95697bf1e

      Currently, we're able to reproduce the actual failurearound once a week by running production software with a forced load pattern. We've run a patched OpenJDK JVM now without failures for a few week. The patch we tried was changing values in the card table from int to long to avoid overflow (https://gist.github.com/jmiettinen/1e1e8a0c71c2d1ae664335f1e875f227).

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      # SIGSEGV (0xb) at pc=0x00007f24c84669f4, pid=289114, tid=0x00007c1152831700
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 )
      # Problematic frame:
      # V [libjvm.so+0x4cb9f4] oopDesc::size()+0x44
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # If you would like to submit a bug report, please visit:
      # http://bugreport.java.com/bugreport/crash.jsp
      #

      --------------- T H R E A D ---------------

      Current thread (0x00007f24c0420800): ConcurrentGCThread [stack: 0x00007c1152731000,0x00007c1152832000] [id=289624]

      siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00000c9a00000dd4

      Registers:
      RAX=0x00007f24c8f55cf9, RBX=0x00007f24c0477dd0, RCX=0x0000000000000001, RDX=0x0000000000000000
      RSP=0x00007c1152830950, RBP=0x00007c1152830950, RSI=0x0000000000000000, RDI=0x00000c9a00000dcc
      R8 =0x00007cf618c10000, R9 =0x0000000000000001, R10=0x00007d3f80262f08, R11=0x00007f24c0474850
      R12=0x00007cf618c10000, R13=0x00007be38f10eea0, R14=0x00007f24c03ff070, R15=0x00007f24c0474850
      RIP=0x00007f24c84669f4, EFLAGS=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
        TRAPNO=0x000000000000000e

      Top of Stack: (sp=0x00007c1152830950)
      0x00007c1152830950: 00007c11528309b0 00007f24c85f4360
      0x00007c1152830960: 00007c11528309b0 00007f23fc005010
      0x00007c1152830970: 00007d3f80262e98 000000000000000e
      0x00007c1152830980: 00007e31e25a9bf0 00007e31e25bc010
      0x00007c1152830990: 00007e31e27b3a18 00007c1152830aa0
      0x00007c11528309a0: 0000000000000001 00007f24c8f55cf9
      0x00007c11528309b0: 00007c11528309f0 00007f24c88a6ee8
      0x00007c11528309c0: 0000000300040001 00007e31e25b39f8
      0x00007c11528309d0: 00007f24c0477dd0 00007c1152830af0
      0x00007c11528309e0: 0000000101840000 0000004061000000
      0x00007c11528309f0: 00007c1152830a20 00007f24c8469d34
      0x00007c1152830a00: 000000000023acb3 0000004060cb673f
      0x00007c1152830a10: 00007f24c03ff0f8 00007f24c0477dd0
      0x00007c1152830a20: 00007c1152830b60 00007f24c846341e
      0x00007c1152830a30: 3fe0000000000000 0000000000000000
      0x00007c1152830a40: 00007f24c9777c53 00007be3b0287ea0
      0x00007c1152830a50: 00007f24c0477ee0 00007c1152830af0
      0x00007c1152830a60: 0100000000000000 4024000000000000
      0x00007c1152830a70: 00007f24c0420801 000000000023fd02
      0x00007c1152830a80: 3fefbb9dc59efb46 3fefbb9dc59efb46
      0x00007c1152830a90: 00007f24c03ff110 00007e31e4000000
      0x00007c1152830aa0: 00007f24c8ef6ad0 000000000100e400
      0x00007c1152830ab0: 00007be32f3b4dd0 00007f24c8f006a0
      0x00007c1152830ac0: 0000000000000b00 00007c1152830aa0
      0x00007c1152830ad0: 00007f24c03ff070 00007f24c002a990
      0x00007c1152830ae0: 00007f24c0477dd0 0000000000000000
      0x00007c1152830af0: 00007f24c8ef6a50 00007f24c03ff0f8
      0x00007c1152830b00: 00007f24c03ff070 00007f24c0477dd0
      0x00007c1152830b10: 00007f24c8ef6ab0 00007f24c0477dd0
      0x00007c1152830b20: 00007f24c002a990 0000000000000000
      0x00007c1152830b30: 0000000000000000 0000000000000000
      0x00007c1152830b40: 00007f24c0477dd0 00007f1db84d6a60

      Instructions: (pc=0x00007f24c84669f4)
      0x00007f24c84669d4: d1 48 d3 e7 48 03 38 8b 4f 08 83 f9 00 7e 19 f6
      0x00007f24c84669e4: c1 01 75 18 89 c8 c9 c1 f8 03 c3 90 48 8b 7f 08
      0x00007f24c84669f4: 8b 4f 08 83 f9 00 7f e7 75 1a 66 90 48 8b 07 4c
      0x00007f24c8466a04: 89 c6 4c 8b 98 e8 00 00 00 c9 41 ff e3 0f 1f 80

      Register to memory mapping:

      RAX=0x00007f24c8f55cf9: <offset 0xfbacf9> in /usr/lib/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so at 0x00007f24c7f9b000
      RBX=0x00007f24c0477dd0 is an unknown value
      RCX=0x0000000000000001 is an unknown value
      RDX=0x0000000000000000 is an unknown value
      RSP=0x00007c1152830950 is an unknown value
      RBP=0x00007c1152830950 is an unknown value
      RSI=0x0000000000000000 is an unknown value
      RDI=0x00000c9a00000dcc is an unknown value
      R8 =0x00007cf618c10000 is pointing into object: 0x00007cf618800018
      [J
       - klass: {type array long}
       - length: 1048576
      R9 =0x0000000000000001 is an unknown value
      R10=0x00007d3f80262f08 is an oop
      [J
       - klass: {type array long}
       - length: 12
      R11=0x00007f24c0474850 is an unknown value
      R12=0x00007cf618c10000 is pointing into object: 0x00007cf618800018
      [J
       - klass: {type array long}
       - length: 1048576
      R13=0x00007be38f10eea0 is an unknown value
      R14=0x00007f24c03ff070 is an unknown value
      R15=0x00007f24c0474850 is an unknown value


      Stack: [0x00007c1152731000,0x00007c1152832000], sp=0x00007c1152830950, free space=1022k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      V [libjvm.so+0x4cb9f4] oopDesc::size()+0x44
      V [libjvm.so+0x659360] CMTask::deal_with_reference(oopDesc*)+0x180
      V [libjvm.so+0x90bee8] ObjArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*)+0x148
      V [libjvm.so+0x4ced34] CMBitMapClosure::do_bit(unsigned long)+0x74
      V [libjvm.so+0x4c841e] CMTask::do_marking_step(double, bool, bool)+0xa9e
      V [libjvm.so+0x4cf673] CMConcurrentMarkingTask::work(unsigned int)+0x293
      V [libjvm.so+0xaeccff] GangWorker::loop()+0xcf
      V [libjvm.so+0x92a338] java_start(Thread*)+0x108


      <SAME AS IN THE LINKED TICKET>

      REPRODUCIBILITY :
      This bug can be reproduced occasionally.

      ---------- BEGIN SOURCE ----------
      // Running the test:
      // javac FromCardCacheTest.java
      // /opt/jdk1.8.0_144_fastdebug/bin/java -classpath . -Xms20M -Xmx20M -XX:HeapBaseMinAddress=2199011721216 -XX:+UseG1GC -verbose:gc FromCardCacheTest

      import java.lang.management.GarbageCollectorMXBean;
      import java.lang.management.ManagementFactory;
      import java.lang.reflect.Field;

      import sun.misc.Unsafe;

      public class FromCardCacheTest {

          private static int nArrays = 24;
          private static int arraySize = 131068; // object array, a bit less than half of a 1M region
          private static int byteArraySize = 524270; // byte array, a bit less than half of a 1M region
          private static final long heapBase = 2199011721216L; // less than 2TB

          public static void main(String[] args) {
              while (runTest()) {
                  System.gc();
                  try {
                      Thread.sleep(2000);
                  } catch(Exception e) {}
              }
          }

          public static boolean runTest() {
              GarbageCollectorMXBean oldGc = null;
              GarbageCollectorMXBean youngGc = null;
              for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
                  if (gc.getName().equals("G1 Old Generation")) oldGc = gc;
                  if (gc.getName().equals("G1 Young Generation")) youngGc = gc;
              }
              if (oldGc == null) {
                  System.out.println("G1 oldGc = null");
                  return false;
              }
              if (youngGc == null) {
                  System.out.println("G1 youngGc = null");
                  return false;
              }
              long oldGcCount = oldGc.getCollectionCount();

              System.out.println("Running test");
              Object[][] arrays = new Object[nArrays][];
              for (int i = 0; i < nArrays; i++) {
                  arrays[i] = new Object[arraySize];
              }

              Object[][] arrayWithCardMinus1 = new Object[1][];
              arrayWithCardMinus1[0] = findArray(arrays, true, true);
              Object[] otherArray = findArray(arrays, false, false);

              if (arrayWithCardMinus1[0] == null) {
                  System.out.println("Array with card -1 not found. Trying again.");
                  return true;
              } else {
                  System.out.println("Found an array with card -1.");
              }

              if (otherArray == null) {
                  System.out.println("otherArray = null.");
                  return true;
              }

              System.out.println("Modifying the last card in the array...");
              final int modifyEntries = 10;

              final long youngGcCount = youngGc.getCollectionCount();
              for (int i = 0; i < modifyEntries; i++) {
                 if (oldGc.getCollectionCount() > oldGcCount) {
                     System.out.println("Old gc done. Trying again.");
                     return true;
                 }

                 final int arrayIndex = arraySize - i - 1;
                 arrayWithCardMinus1[0][arrayIndex] = new Object[]{new byte[byteArraySize]};

                 final int card = getCard32Bit(getAddress(arrayWithCardMinus1[0]) + 4 * arrayIndex);
                 System.out.println(arrayIndex + " " + String.format("0x%016x", getAddress(arrayWithCardMinus1[0]) + 16 + 4 * arrayIndex) + " set to " +
                   String.format("0x%016x", getAddress(arrayWithCardMinus1[0][arrayIndex])) + ", card: " + card);

                 if (card != -1) {
                     arrayWithCardMinus1[0] = findArray(arrays, true, true);
                     if (arrayWithCardMinus1[0] == null) {
                         System.out.println("Array with card -1 not found. Trying again.");
                         return true;
                     }
                 }

                 try {
                      Thread.sleep(100);
                 } catch(Exception e) {}
              }


              System.out.println("Producting garbage to run gc.");

              final long t1 = System.currentTimeMillis();
              oldGcCount = oldGc.getCollectionCount();
              while (System.currentTimeMillis() - t1 < 60000) {

                 if (oldGc.getCollectionCount() > oldGcCount) {
                     System.out.println("Old gc done. Trying again.");
                     return true;
                 }

                 otherArray[0] = new byte[30000];
                 otherArray[0] = null;

                 try {
                      Thread.sleep(100);
                 } catch(Exception e) {}
              }

              System.out.println("The crash didn't reproduce. Trying again.");
              return true;
          }

          private static Object[] findArray(Object[][] arrays, boolean withCardMinus1, boolean printAddresses) {
              for (int i = 0; i < arrays.length; i++) {
                  if (arrays[i] == null) continue;
                  final long startAddress = getAddress(arrays[i]);
                  final long lastCardAddress = startAddress + 4 * arrays[i].length;
                  final int card = getCard32Bit(lastCardAddress);
                  if (printAddresses) {
                      System.out.println("array " + i + ", startAddress: " + String.format("0x%016x", startAddress) +
                          ", lastCardAddress: " + String.format("0x%016x", lastCardAddress) + ", card: " + card);
                  }

                  if ((withCardMinus1 && card == -1) || (!withCardMinus1 && card != -1)) {
                      Object[] foundArray = arrays[i];
                      arrays[i] = null;
                      return foundArray;
                  }
              }
              return null;
          }


          // Unsafe is only used for reading object addresses.
          private static final Unsafe UNSAFE;
          static {
              try {
                  Field e = Unsafe.class.getDeclaredField("theUnsafe");
                  e.setAccessible(true);
                  UNSAFE = (Unsafe)e.get((Object)null);
              } catch (IllegalAccessException | NoSuchFieldException var1) {
                  throw new Error(var1);
              }
          }
          static Object[] addrArray = new Object[]{null};
          static final long baseOffset = UNSAFE.arrayBaseOffset(Object[].class);
          static final int pageSize = UNSAFE.pageSize();
          public static long getAddress(Object o) {
              addrArray[0] = o;
              final long address = (UNSAFE.getInt(addrArray, baseOffset) << 3) + heapBase - pageSize + 1024*1024L;
              addrArray[0] = null;
              return address;
          }

          public static int getCard32Bit(final long address) {
              final int card = (int)(address >> 9);
              return card;
          }
      }
      ---------- END SOURCE ----------

            tschatzl Thomas Schatzl
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: