Throughput of ForkJoinPool drastically reduced in Java 23

XMLWordPrintable

    • Type: Bug
    • Resolution: Unresolved
    • Priority: P4
    • None
    • Affects Version/s: 23, 24, 25, 26
    • Component/s: performance
    • None
    • Environment:

      Linux 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

      From Java 1.8 through to Java 16, the throughput of the fixed thread pool (Executors.newFixedThreadPool()) for very short tasks (LongAdder::increment) was between 1.3 and 1.42x higher than the ForkJoinPool (Executors.newWorkStealingPool()).

      In Java 17, this changed, and the ForkJoinPool had substantially higher throughput, where it became 4.5 higher than the fixed thread pool. In Java 18, it was 4.9x higher. In Java 21, it went slightly down to 3.9x and in Java 22, it went down even more to 3.4x. However, the ForkJoinPool was still much faster at processing small jobs than the fixed thread pool.

      In Java 23, something happened where the throughput for the ForkJoinPool reverted to Java 16 levels.

      When the ForkJoinPool is used by parallel streams, we might not see a difference in performance between the various versions, since the work is fairly granular. However, we might notice a difference with very short virtual threads. For example, the throughput of virtual thread per task executors in Java 22 is about 2x higher than for Java 23. The throughput for virtual thread per task executors in Java 24 increased somewhat, but decreases again in Java 25 and 26.

      I am hesitant to even label this a "bug", because I do not have a real-life demo that shows that the reduced throughput is an issue in real systems. However, I wanted to log it in case it helps analyze other issues.

      Demo code:

      package tjsn.ideas2025.juc;

      import java.lang.reflect.*;
      import java.util.*;
      import java.util.concurrent.*;
      import java.util.concurrent.atomic.*;
      import java.util.concurrent.locks.*;
      import java.util.function.*;
      import java.util.stream.*;

      /**
       * The purpose of this demo is to show various ExecutorServices and how quickly
       * they can execute a lot of short tasks. We found that in some versions, the
       * ForkJoinPool is much faster than the Fixed Thread Pool, but in recent Java
       * versions (Java 23 and later), the ForkJoinPool has slowed down dramatically.
       * We are not sure if this has any real world implications, since individual
       * tasks should use a substantial number of clock cycles in order to take
       * advantage of parallel execution anyway. See
       * https://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
       */
      public class ExecutorServicePerformanceDemo {
          public static void main(String... args) {
              System.out.println("Java Version " + System.getProperty("java.version"));
              for (int i = 0; i < 5; i++) {
                  int availableProcessors =
                          Runtime.getRuntime().availableProcessors();
                  exercise("FixedThreadPool(" + availableProcessors + ")",
                          () -> Executors.newFixedThreadPool(availableProcessors));
                  exercise("WorkStealingPool(" + availableProcessors + ")",
                          () -> Executors.newWorkStealingPool(availableProcessors));
                  exercise("CachedThreadPool()",
                          Executors::newCachedThreadPool);

                  // To support older Java versions, we create the ThreadPerTaskExecutor using reflection

                  Optional<Class<?>> optionalBuilderClass = Stream.of(Thread.class.getClasses())
                          .filter(Class::isInterface)
                          .filter(intf -> intf.getSimpleName().equals("Builder"))
                          .findFirst();
                  optionalBuilderClass.ifPresent(builderClass ->
                          exercise("VirtualThreadPerTaskExecutor()",
                                  () -> getExecutorService(builderClass, "ofVirtual")));
                  optionalBuilderClass.ifPresent(builderClass ->
                          exercise("PlatformThreadPerTaskExecutor()",
                                  () -> getExecutorService(builderClass, "ofPlatform")));
                  System.out.println();
              }

              System.out.println();
              System.out.println("Best Throughput for Java " + System.getProperty("java.version"));
              for (Map.Entry<String, Long> entry : bestThroughput.entrySet()) {
                  System.out.printf(Locale.US, "%s:%n\t%,d%n",
                          entry.getKey(), entry.getValue());
              }
          }

          private static ExecutorService getExecutorService(Class<?> builderClass, String ofBuilderMethodName) {
              try {
                  Object virtualThreadBuilder =
                          Thread.class.getMethod(ofBuilderMethodName).invoke(null);
                  Object virtualThreadFactory =
                          builderClass.getMethod("factory").invoke(virtualThreadBuilder);
                  return (ExecutorService) Executors.class.getMethod("newThreadPerTaskExecutor",
                          ThreadFactory.class).invoke(null, virtualThreadFactory);
              } catch (ReflectiveOperationException e) {
                  // should not happen
                  throw new AssertionError(e);
              }
          }

          private static Map<String, Long> bestThroughput =
                  new LinkedHashMap<>();

          private static void exercise(
                  String description, Supplier<? extends ExecutorService> supplier) {
              ExecutorService service = supplier.get();
              System.out.print(description + ": ");
              try (ExecutorServiceCloser closer =
                           new ExecutorServiceCloser(service)) {
                  AtomicBoolean running = new AtomicBoolean(true);
                  ScheduledExecutorService timer =
                          Executors.newSingleThreadScheduledExecutor();
                  timer.schedule(() -> {
                      timer.shutdown();
                      running.set(false);
                  }, 2, TimeUnit.SECONDS);
                  LongAdder count = new LongAdder();
                  long submitted = 0;
                  while (running.get()) {
                      service.submit(count::increment);
                      submitted++;
                      // Throttle to allow execution to catch up
                      while (submitted > count.longValue() + 10_000)
                          LockSupport.parkNanos(100_000);
                      // Thread.onSpinWait(); // not available in Java 8
                  }
                  System.out.printf("%,d%n", count.longValue());
                  bestThroughput.merge(description, count.longValue(), Long::max);
                  checkDistribution(count);
              }
          }

          private static void checkDistribution(LongAdder adder) {
              try {
                  Field cellsField = LongAdder.class.getSuperclass().getDeclaredField("cells");
                  cellsField.setAccessible(true);
                  Object[] cells = (Object[]) cellsField.get(adder);
                  if (cells != null) {
                      System.out.println("cells.length = " + cells.length);
                      int num = 0;
                      for (Object cell : cells) {
                          if (cell != null) {
                              Field valueField = cell.getClass().getDeclaredField("value");
                              valueField.setAccessible(true);
                              long value = (long) valueField.get(cell);
                              System.out.printf(Locale.US, "\t% 2d %,d%n", num++, value);
                          }
                      }
                      System.out.println();
                  }
              } catch (ReflectiveOperationException e) {
                  throw new AssertionError(e);
              }
          }

          // for older versions of Java
          private static class ExecutorServiceCloser
                  implements AutoCloseable {
              private final ExecutorService service;

              public ExecutorServiceCloser(ExecutorService service) {
                  this.service = service;
              }

              public void close() {
                  boolean terminated = service.isTerminated();
                  if (!terminated) {
                      service.shutdown();
                      boolean interrupted = false;
                      while (!terminated) {
                          try {
                              terminated = service.awaitTermination(
                                      1L, TimeUnit.DAYS);
                          } catch (InterruptedException e) {
                              if (!interrupted) {
                                  service.shutdownNow();
                                  interrupted = true;
                              }
                          }
                      }
                      if (interrupted) Thread.currentThread().interrupt();
                  }
              }
          }
      }

      Summarized output for Java 8, 11-18, 21-26

      Best Throughput for Java 1.8.0_472
      FixedThreadPool(12): 16,539,415
      WorkStealingPool(12): 12,161,482
      CachedThreadPool(): 5,240,226

      Best Throughput for Java 11.0.29
      FixedThreadPool(12): 17,325,727
      WorkStealingPool(12): 13,289,596
      CachedThreadPool(): 5,689,781

      Best Throughput for Java 12.0.2
      FixedThreadPool(12): 18,839,490
      WorkStealingPool(12): 13,553,481
      CachedThreadPool(): 5,509,699

      Best Throughput for Java 13.0.14
      FixedThreadPool(12): 18,052,340
      WorkStealingPool(12): 12,984,799
      CachedThreadPool(): 5,666,061

      Best Throughput for Java 14.0.2
      FixedThreadPool(12): 18,508,592
      WorkStealingPool(12): 13,444,555
      CachedThreadPool(): 5,629,916

      Best Throughput for Java 15.0.10
      FixedThreadPool(12): 18,687,956
      WorkStealingPool(12): 13,181,730
      CachedThreadPool(): 5,579,264

      Best Throughput for Java 16.0.2
      FixedThreadPool(12): 19,035,898
      WorkStealingPool(12): 13,389,173
      CachedThreadPool(): 5,471,937

      Best Throughput for Java 17.0.17
      FixedThreadPool(12): 19,338,253
      WorkStealingPool(12): 87,815,764
      CachedThreadPool(): 4,564,468

      Best Throughput for Java 18.0.2
      FixedThreadPool(12): 18,111,864
      WorkStealingPool(12): 89,167,812
      CachedThreadPool(): 4,233,959

      Best Throughput for Java 21.0.9
      FixedThreadPool(12): 17,042,381
      WorkStealingPool(12): 66,046,206
      CachedThreadPool(): 4,573,772
      VirtualThreadPerTaskExecutor(): 22,311,961
      PlatformThreadPerTaskExecutor(): 123,191

      Best Throughput for Java 22.0.2
      FixedThreadPool(12): 17,895,714
      WorkStealingPool(12): 61,158,200
      CachedThreadPool(): 4,692,956
      VirtualThreadPerTaskExecutor(): 24,655,311
      PlatformThreadPerTaskExecutor(): 119,036

      Best Throughput for Java 23.0.2
      FixedThreadPool(12): 17,279,619
      WorkStealingPool(12): 13,008,926
      CachedThreadPool(): 4,572,345
      VirtualThreadPerTaskExecutor(): 12,618,624
      PlatformThreadPerTaskExecutor(): 120,829

      Best Throughput for Java 24.0.2
      FixedThreadPool(12): 17,982,688
      WorkStealingPool(12): 13,425,840
      CachedThreadPool(): 4,536,324
      VirtualThreadPerTaskExecutor(): 19,040,775
      PlatformThreadPerTaskExecutor(): 126,088

      Best Throughput for Java 25.0.1
      FixedThreadPool(12): 17,742,859
      WorkStealingPool(12): 13,375,118
      CachedThreadPool(): 4,487,596
      VirtualThreadPerTaskExecutor(): 12,866,378
      PlatformThreadPerTaskExecutor(): 125,123

      Best Throughput for Java 26-ea
      FixedThreadPool(12): 17,679,490
      WorkStealingPool(12): 13,174,342
      CachedThreadPool(): 4,751,515
      VirtualThreadPerTaskExecutor(): 12,216,321
      PlatformThreadPerTaskExecutor(): 126,563

            Assignee:
            Viktor Klang
            Reporter:
            Heinz Kabutz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: