Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8330017

ForkJoinPool stops executing tasks due to ctl field Release Count (RC) overflow

XMLWordPrintable

      ADDITIONAL SYSTEM INFORMATION :
      Running on Linux x64, JDK 17.0.10

      A DESCRIPTION OF THE PROBLEM :
      RC part of ctl field keeps decreasing until it reaches the min 16 bit negative number (-32768) and then on the next decrement, the value overflows to +32767 (value equals to ForkJoinPool.MAX_CAP) and then it stops executing tasks.

      I saw this issue in an application that had been running for 2/3 months.
      When it happens, the threads that are waiting for the result of a CompletableFuture.join() are blocked forever, because the future never completes.

      Cannot reproduce the issue with this test in Java >= 19.0.2.
      I think the issue was indirectly fixed in this ticket https://bugs.openjdk.org/browse/JDK-8277090
      because the ctl RC field definition changed from:
      RC: Number of released (unqueued) workers minus target parallelism
      to
      RC: Number of released (unqueued) workers

      Since RC is not the result of a subtraction anymore, it shouldn't become negative.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the provided script with the command:
      java --add-opens java.base/java.util.concurrent=ALL-UNNAMED FJPOverflow

      The key to this test is to force as many pool resizes as possible, so I set a low keep-alive time for the threads.
      Without this trick, it takes a long time to reproduce the issue.

      Until the program prints the string: "If you see this, FJP is executing tasks", ForkJoinPool are correctly executing submitted tasks. When RC overflows to 32767, the string will never be printed again.
      It takes about 1 hour to reach this condition naturally.

      Output example near the block condition:
      CTL=(-9222527629104513024), RC=(10000000 00000010 , -32766), TC=(11111111 11111100 , -4), SS=(00000000 00000000 , 0), ID=(00000000 00000000 , 0)
      If you see this, FJP is executing tasks
      CTL=(-9222527624809545728), RC=(10000000 00000010 , -32766), TC=(11111111 11111101 , -3), SS=(00000000 00000000 , 0), ID=(00000000 00000000 , 0)
      If you see this, FJP is executing tasks
      CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)
      CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)
      CTL=(9223372019674972185), RC=(01111111 11111111 , 32767), TC=(11111111 11111100 , -4), SS=(00000000 00000001 , 1), ID=(00000000 00011001 , 25)

      Not recommended:
      With the program argument -c, the ctl value is set to -9222809108376190976L (RC=-32767) using reflection to speed up the issue reproduction. I added this condition to perform some tests. Run the program without the -c argument to reproduce the issue naturally.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      the RC value should not continue to decrease and should not overflow
      ACTUAL -
      the RC value keeps decreasing until -32768, and then overflows to +32767

      ---------- BEGIN SOURCE ----------
      import java.lang.reflect.Field;
      import java.util.concurrent.ForkJoinPool;
      import java.util.concurrent.TimeUnit;

      public class FJPOverflow {
          public static final int TASKS_PER_ITERATION = 100;
          public static final int TASK_DURATION_MS = 5;
          public static final int ITERATION_DELAY_MS = 100; //Try increment this value if RC is not decreasing
          public static final int THREAD_KEEP_ALIVE_MS = 10; //Low keep alive time to trigger frequents pool resize
          public static final int MAX_CAP = 0x7fff; //(32767) Same as ForkJoinPool.MAX_CAP

          // RC=-32767 | TC | SS | ID
          //10000000 00000001 11111111 11111011 00000000 00000000 00000000 00000000
          public static final long RC_NEG_32767 = -9222809108376190976L;

          // RC=-32000 | TC | SS | ID
          //10000011 00000000 11111111 11111011 00000000 00000000 00000000 00000000
          public static final long RC_NEG_32000 = -9006917801239117824L;

          // RC=-1 | TC | SS | ID
          //11111111 11111111 11111111 11111011 00000000 00000000 00000000 00000000
          public static final long RC_NEG_1 = -21474836480L;

          private static final ForkJoinPool fjp = new ForkJoinPool(
                  Runtime.getRuntime().availableProcessors(),
                  ForkJoinPool.defaultForkJoinWorkerThreadFactory,
                  null,
                  false,
                  0,
                  MAX_CAP,
                  1,
                  null,
                  THREAD_KEEP_ALIVE_MS,
                  TimeUnit.MILLISECONDS
          );

          public static void main(String[] args) throws InterruptedException {
              var options = new Options(args);

              var iterationDelay = ITERATION_DELAY_MS;
              if(options.forceCtl) {
                  iterationDelay = 1000;
                  setCtl(RC_NEG_32767);
              }

              while(true) {
                  runTasks();
                  printCtl();
                  runAsync(() -> System.out.println("If you see this, FJP is executing tasks"));
                  Thread.sleep(iterationDelay);
              }
          }

          private static void setCtl(long value) {
              try {
                  Field field = fjp.getClass().getDeclaredField("ctl");
                  field.setAccessible(true);
                  field.setLong(fjp, value);
              } catch (IllegalAccessException | NoSuchFieldException e) {
                  throw new RuntimeException(e);
              }
          }

          private static void runTasks() {
              for (int i = 0; i < TASKS_PER_ITERATION; i++) {
                  runAsync(FJPOverflow::sleepCallback);
              }
          }

          private static void runAsync(Runnable block) {
              fjp.execute(block);
          }

          private static void sleepCallback() {
              try {
                  Thread.sleep(TASK_DURATION_MS);
              } catch (InterruptedException e) {
                  e.printStackTrace();
              }
          }

          private static void printCtl() {
              try {
                  Field field = fjp.getClass().getDeclaredField("ctl");
                  field.setAccessible(true);
                  long value = (long) field.get(fjp);
                  System.out.println(ctlAsBinary(value));
              } catch (NoSuchFieldException | IllegalAccessException e) {
                  e.printStackTrace();
              }
          }

          private static String ctlAsBinary(long value) {
              String binaryCtl = longToBinary(value);
              String binaryRc = binaryCtl.substring(0, 16);
              String binaryTc = binaryCtl.substring(16, 32);
              String binarySs = binaryCtl.substring(32, 48);
              String binaryId = binaryCtl.substring(48, 64);

              return "CTL=(" + value + "), " +
                      "RC=(" + prettifyBinary(binaryRc) + ", " + binaryToInt(binaryRc) + "), " +
                      "TC=(" + prettifyBinary(binaryTc) + ", " + binaryToInt(binaryTc) + "), " +
                      "SS=(" + prettifyBinary(binarySs) + ", " + binaryToInt(binarySs) + "), " +
                      "ID=(" + prettifyBinary(binaryId) + ", " + binaryToInt(binaryId) + ")";
          }

          private static String longToBinary(long value) {
              // If the value is non-negative, convert it normally
              if (value >= 0) {
                  return padLeftZeros(Long.toBinaryString(value), 64);
              }

              // For negative values, calculate the two's complement
              var positiveValue = -value;
              var invertedValue = ~positiveValue;
              var twosComplement = (invertedValue + 1);

              return padLeftZeros(Long.toBinaryString(twosComplement), 64);
          }

          private static String padLeftZeros(String inputString, int length) {
              if (inputString.length() >= length) {
                  return inputString;
              }
              StringBuilder sb = new StringBuilder();
              while (sb.length() < length - inputString.length()) {
                  sb.append('0');
              }
              sb.append(inputString);

              return sb.toString();
          }

          private static int binaryToInt(String binary) {
              var isNegative = binary.charAt(0) == '1';
              if (!isNegative) {
                  return Integer.parseInt(binary, 2);
              }

              StringBuilder bitsInvertedBinary = new StringBuilder();
              for(int i=0; i < binary.length(); i++) {
                  bitsInvertedBinary.append(binary.charAt(i) == '0' ? '1' : '0');
              }

              return -(Integer.parseInt(bitsInvertedBinary.toString(), 2) + 1);
          }

          private static String prettifyBinary(String binary) {
              StringBuilder result = new StringBuilder();
              for (int i = 0; i < binary.length(); i += 8) {
                  result.append(binary, i, Math.min(i + 8, binary.length())).append(" ");
              }
              return result.toString();
          }

          protected static class Options {
              private boolean forceCtl = false;

              Options(String[] args) {
                  for (String arg : args) {
                      if (arg.equals("-c")) {
                          forceCtl = true;
                      }
                  }
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Restart the application

            jjose Johny Jose
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: