Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8325754

Dead AbstractQueuedSynchronizer$ConditionNodes survive minor garbage collections

    XMLWordPrintable

Details

    Description

      A DESCRIPTION OF THE PROBLEM :
      On executing many asynchronous tasks in a fixed thread pool, survivor spaces grow unexpectedly, and minor collection times increase though the application doesn’t generate much garbage.

      This being the case, many java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode
      instances can be found on the heap. In fact, the whole heap (rank 1 as shown in jmap) is filled up with ConditionNode instances after a while.

      G1 seems to be able to collect “dead” ConditionNode instances during minor collections only if no formerly alive ConditionNode instances were promoted to the old generation and died there, which often cannot be avoided since e.g. on application startup many objects are promoted to the old generation after a few collections.

      Proposed solution:
      Similar to https://bugs.openjdk.org/browse/JDK-6805775, 'ConditionNode' instances should unlink themselves (i.e. their ‘nextWaiter’ reference) before they are transferred to the sync queue.

      Example code change OpenJDK 17 (https://github.com/openjdk/jdk/blob/jdk-17%2B35/src/java.base/share/classes/java/util/concurrent/locks/AbstractQueuedSynchronizer.java):

      @@ -1431,40 +1431,41 @@ public abstract class AbstractQueuedSynchronizer
           public class ConditionObject implements Condition, java.io.Serializable {
               // ...
               private void doSignal(ConditionNode first, boolean all) {
                   while (first != null) {
                       ConditionNode next = first.nextWaiter;
      + first.nextWaiter = null; // GC-friendly
                       if ((firstWaiter = next) == null)
                           lastWaiter = null;
                       if ((first.getAndUnsetStatus(COND) & COND) != 0) {
                           enqueue(first);
                       // ...

      This applies not only to 'AbstractQueuedSynchronizer', but also to 'AbstractQueuedLongSynchronizer'.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Compile and run attached “G1LoiteringConditionNodes” class, e.g. under Linux with OpenJDK 17.0.10:

      java -Xms2048m -Xmx2048m -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=20 -Xlog:gc*,gc+age*=trace -cp . G1LoiteringConditionNodes

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Survivor spaces and minor collection times shall not increase.
      ACTUAL -
      During the first two minutes of the test case, everything is fine, but after a promotion to the old generation (which is triggered after 2 minutes for demo purposes), survivors grow and minor pause time increase, i.e. they are more than doubled.

      ---------- BEGIN SOURCE ----------
      import java.util.Arrays;
      import java.util.concurrent.Callable;
      import java.util.concurrent.ExecutorService;
      import java.util.concurrent.Executors;
      import java.util.concurrent.ScheduledExecutorService;
      import java.util.concurrent.TimeUnit;

      /**
       * Asynchronously execute tasks in a fixed thread pool, in order to demonstrate that
       * {@code java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode} instances are properly
       * collected during minor collections, but only if no instances were promoted to the old generation.
       * <p>
       * If such instances were promoted to old generation (here after 2 minutes),
       * ConditionNode instances are not collected during minor collections by G1 in OpenJDK 17.0.9+9 any more,
       * but promoted to survivor spaces and finally to the old generation, filling it up until a
       * mixed or full collection kicks in.
       * <p>
       * This increase minor collection pauses from 3ms to 10ms, and leads to earlier mixed collections later on.
       *
       * Recommended VM-options: -Xms2048m -Xmx2048m -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=20 -Xlog:gc*,gc+age*=trace
       */
      public class G1LoiteringConditionNodes {
          
          private static final int NUM_OF_PRODUCERS = 16;
          private static final int NUM_OF_WORKERS = 32;
          
          private static final long TXNS_PER_SECOND = 1600;
          
          private static final long WORK_MILLISECONDS = 10;
          
          private static final long GC_AFTER_SECONDS = 120;

          public static void main(final String[] args) {

              // worker thread pool
              final ExecutorService workers = Executors.newFixedThreadPool(NUM_OF_WORKERS);
              final Callable<String> work = () -> {
                  try {
                      // simulate work (just sleep for a while)
                      TimeUnit.MILLISECONDS.sleep(WORK_MILLISECONDS);
                      
                  } catch (final InterruptedException e) {
                      Thread.currentThread().interrupt();
                  }
                  // generate some garbage
                  return Arrays.toString(new byte[4096]) + System.currentTimeMillis();
              };

              // produce tasks to be scheduled in the worker pool
              final ScheduledExecutorService producer = Executors.newScheduledThreadPool(NUM_OF_PRODUCERS);
              producer.scheduleWithFixedDelay(
                      () -> workers.submit(work),
                      0, TimeUnit.SECONDS.toNanos(1L) / TXNS_PER_SECOND,
                      TimeUnit.NANOSECONDS);

              // trigger a full garbage collection, in order to promote ConditionNode objects to old generation for test purposes
              // in real live application, ConditionNode objects are promoted to old gen. on JVM startup
              // if many objects are created, more than can be hold in survivors
              producer.schedule(System::gc, GC_AFTER_SECONDS, TimeUnit.SECONDS);
          }
      }
      ---------- END SOURCE ----------

      FREQUENCY : often


      Attachments

        Issue Links

          Activity

            People

              vklang Viktor Klang
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: