-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
P4
-
None
-
Affects Version/s: 23, 24, 25, 26
-
Component/s: performance
-
None
-
Environment:
Linux 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
From Java 1.8 through to Java 16, the throughput of the fixed thread pool (Executors.newFixedThreadPool()) for very short tasks (LongAdder::increment) was between 1.3 and 1.42x higher than the ForkJoinPool (Executors.newWorkStealingPool()).
In Java 17, this changed, and the ForkJoinPool had substantially higher throughput, where it became 4.5 higher than the fixed thread pool. In Java 18, it was 4.9x higher. In Java 21, it went slightly down to 3.9x and in Java 22, it went down even more to 3.4x. However, the ForkJoinPool was still much faster at processing small jobs than the fixed thread pool.
In Java 23, something happened where the throughput for the ForkJoinPool reverted to Java 16 levels.
When the ForkJoinPool is used by parallel streams, we might not see a difference in performance between the various versions, since the work is fairly granular. However, we might notice a difference with very short virtual threads. For example, the throughput of virtual thread per task executors in Java 22 is about 2x higher than for Java 23. The throughput for virtual thread per task executors in Java 24 increased somewhat, but decreases again in Java 25 and 26.
I am hesitant to even label this a "bug", because I do not have a real-life demo that shows that the reduced throughput is an issue in real systems. However, I wanted to log it in case it helps analyze other issues.
Demo code:
package tjsn.ideas2025.juc;
import java.lang.reflect.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
import java.util.concurrent.locks.*;
import java.util.function.*;
import java.util.stream.*;
/**
* The purpose of this demo is to show various ExecutorServices and how quickly
* they can execute a lot of short tasks. We found that in some versions, the
* ForkJoinPool is much faster than the Fixed Thread Pool, but in recent Java
* versions (Java 23 and later), the ForkJoinPool has slowed down dramatically.
* We are not sure if this has any real world implications, since individual
* tasks should use a substantial number of clock cycles in order to take
* advantage of parallel execution anyway. See
* https://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
*/
public class ExecutorServicePerformanceDemo {
public static void main(String... args) {
System.out.println("Java Version " + System.getProperty("java.version"));
for (int i = 0; i < 5; i++) {
int availableProcessors =
Runtime.getRuntime().availableProcessors();
exercise("FixedThreadPool(" + availableProcessors + ")",
() -> Executors.newFixedThreadPool(availableProcessors));
exercise("WorkStealingPool(" + availableProcessors + ")",
() -> Executors.newWorkStealingPool(availableProcessors));
exercise("CachedThreadPool()",
Executors::newCachedThreadPool);
// To support older Java versions, we create the ThreadPerTaskExecutor using reflection
Optional<Class<?>> optionalBuilderClass = Stream.of(Thread.class.getClasses())
.filter(Class::isInterface)
.filter(intf -> intf.getSimpleName().equals("Builder"))
.findFirst();
optionalBuilderClass.ifPresent(builderClass ->
exercise("VirtualThreadPerTaskExecutor()",
() -> getExecutorService(builderClass, "ofVirtual")));
optionalBuilderClass.ifPresent(builderClass ->
exercise("PlatformThreadPerTaskExecutor()",
() -> getExecutorService(builderClass, "ofPlatform")));
System.out.println();
}
System.out.println();
System.out.println("Best Throughput for Java " + System.getProperty("java.version"));
for (Map.Entry<String, Long> entry : bestThroughput.entrySet()) {
System.out.printf(Locale.US, "%s:%n\t%,d%n",
entry.getKey(), entry.getValue());
}
}
private static ExecutorService getExecutorService(Class<?> builderClass, String ofBuilderMethodName) {
try {
Object virtualThreadBuilder =
Thread.class.getMethod(ofBuilderMethodName).invoke(null);
Object virtualThreadFactory =
builderClass.getMethod("factory").invoke(virtualThreadBuilder);
return (ExecutorService) Executors.class.getMethod("newThreadPerTaskExecutor",
ThreadFactory.class).invoke(null, virtualThreadFactory);
} catch (ReflectiveOperationException e) {
// should not happen
throw new AssertionError(e);
}
}
private static Map<String, Long> bestThroughput =
new LinkedHashMap<>();
private static void exercise(
String description, Supplier<? extends ExecutorService> supplier) {
ExecutorService service = supplier.get();
System.out.print(description + ": ");
try (ExecutorServiceCloser closer =
new ExecutorServiceCloser(service)) {
AtomicBoolean running = new AtomicBoolean(true);
ScheduledExecutorService timer =
Executors.newSingleThreadScheduledExecutor();
timer.schedule(() -> {
timer.shutdown();
running.set(false);
}, 2, TimeUnit.SECONDS);
LongAdder count = new LongAdder();
long submitted = 0;
while (running.get()) {
service.submit(count::increment);
submitted++;
// Throttle to allow execution to catch up
while (submitted > count.longValue() + 10_000)
LockSupport.parkNanos(100_000);
// Thread.onSpinWait(); // not available in Java 8
}
System.out.printf("%,d%n", count.longValue());
bestThroughput.merge(description, count.longValue(), Long::max);
checkDistribution(count);
}
}
private static void checkDistribution(LongAdder adder) {
try {
Field cellsField = LongAdder.class.getSuperclass().getDeclaredField("cells");
cellsField.setAccessible(true);
Object[] cells = (Object[]) cellsField.get(adder);
if (cells != null) {
System.out.println("cells.length = " + cells.length);
int num = 0;
for (Object cell : cells) {
if (cell != null) {
Field valueField = cell.getClass().getDeclaredField("value");
valueField.setAccessible(true);
long value = (long) valueField.get(cell);
System.out.printf(Locale.US, "\t% 2d %,d%n", num++, value);
}
}
System.out.println();
}
} catch (ReflectiveOperationException e) {
throw new AssertionError(e);
}
}
// for older versions of Java
private static class ExecutorServiceCloser
implements AutoCloseable {
private final ExecutorService service;
public ExecutorServiceCloser(ExecutorService service) {
this.service = service;
}
public void close() {
boolean terminated = service.isTerminated();
if (!terminated) {
service.shutdown();
boolean interrupted = false;
while (!terminated) {
try {
terminated = service.awaitTermination(
1L, TimeUnit.DAYS);
} catch (InterruptedException e) {
if (!interrupted) {
service.shutdownNow();
interrupted = true;
}
}
}
if (interrupted) Thread.currentThread().interrupt();
}
}
}
}
Summarized output for Java 8, 11-18, 21-26
Best Throughput for Java 1.8.0_472
FixedThreadPool(12): 16,539,415
WorkStealingPool(12): 12,161,482
CachedThreadPool(): 5,240,226
Best Throughput for Java 11.0.29
FixedThreadPool(12): 17,325,727
WorkStealingPool(12): 13,289,596
CachedThreadPool(): 5,689,781
Best Throughput for Java 12.0.2
FixedThreadPool(12): 18,839,490
WorkStealingPool(12): 13,553,481
CachedThreadPool(): 5,509,699
Best Throughput for Java 13.0.14
FixedThreadPool(12): 18,052,340
WorkStealingPool(12): 12,984,799
CachedThreadPool(): 5,666,061
Best Throughput for Java 14.0.2
FixedThreadPool(12): 18,508,592
WorkStealingPool(12): 13,444,555
CachedThreadPool(): 5,629,916
Best Throughput for Java 15.0.10
FixedThreadPool(12): 18,687,956
WorkStealingPool(12): 13,181,730
CachedThreadPool(): 5,579,264
Best Throughput for Java 16.0.2
FixedThreadPool(12): 19,035,898
WorkStealingPool(12): 13,389,173
CachedThreadPool(): 5,471,937
Best Throughput for Java 17.0.17
FixedThreadPool(12): 19,338,253
WorkStealingPool(12): 87,815,764
CachedThreadPool(): 4,564,468
Best Throughput for Java 18.0.2
FixedThreadPool(12): 18,111,864
WorkStealingPool(12): 89,167,812
CachedThreadPool(): 4,233,959
Best Throughput for Java 21.0.9
FixedThreadPool(12): 17,042,381
WorkStealingPool(12): 66,046,206
CachedThreadPool(): 4,573,772
VirtualThreadPerTaskExecutor(): 22,311,961
PlatformThreadPerTaskExecutor(): 123,191
Best Throughput for Java 22.0.2
FixedThreadPool(12): 17,895,714
WorkStealingPool(12): 61,158,200
CachedThreadPool(): 4,692,956
VirtualThreadPerTaskExecutor(): 24,655,311
PlatformThreadPerTaskExecutor(): 119,036
Best Throughput for Java 23.0.2
FixedThreadPool(12): 17,279,619
WorkStealingPool(12): 13,008,926
CachedThreadPool(): 4,572,345
VirtualThreadPerTaskExecutor(): 12,618,624
PlatformThreadPerTaskExecutor(): 120,829
Best Throughput for Java 24.0.2
FixedThreadPool(12): 17,982,688
WorkStealingPool(12): 13,425,840
CachedThreadPool(): 4,536,324
VirtualThreadPerTaskExecutor(): 19,040,775
PlatformThreadPerTaskExecutor(): 126,088
Best Throughput for Java 25.0.1
FixedThreadPool(12): 17,742,859
WorkStealingPool(12): 13,375,118
CachedThreadPool(): 4,487,596
VirtualThreadPerTaskExecutor(): 12,866,378
PlatformThreadPerTaskExecutor(): 125,123
Best Throughput for Java 26-ea
FixedThreadPool(12): 17,679,490
WorkStealingPool(12): 13,174,342
CachedThreadPool(): 4,751,515
VirtualThreadPerTaskExecutor(): 12,216,321
PlatformThreadPerTaskExecutor(): 126,563
In Java 17, this changed, and the ForkJoinPool had substantially higher throughput, where it became 4.5 higher than the fixed thread pool. In Java 18, it was 4.9x higher. In Java 21, it went slightly down to 3.9x and in Java 22, it went down even more to 3.4x. However, the ForkJoinPool was still much faster at processing small jobs than the fixed thread pool.
In Java 23, something happened where the throughput for the ForkJoinPool reverted to Java 16 levels.
When the ForkJoinPool is used by parallel streams, we might not see a difference in performance between the various versions, since the work is fairly granular. However, we might notice a difference with very short virtual threads. For example, the throughput of virtual thread per task executors in Java 22 is about 2x higher than for Java 23. The throughput for virtual thread per task executors in Java 24 increased somewhat, but decreases again in Java 25 and 26.
I am hesitant to even label this a "bug", because I do not have a real-life demo that shows that the reduced throughput is an issue in real systems. However, I wanted to log it in case it helps analyze other issues.
Demo code:
package tjsn.ideas2025.juc;
import java.lang.reflect.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
import java.util.concurrent.locks.*;
import java.util.function.*;
import java.util.stream.*;
/**
* The purpose of this demo is to show various ExecutorServices and how quickly
* they can execute a lot of short tasks. We found that in some versions, the
* ForkJoinPool is much faster than the Fixed Thread Pool, but in recent Java
* versions (Java 23 and later), the ForkJoinPool has slowed down dramatically.
* We are not sure if this has any real world implications, since individual
* tasks should use a substantial number of clock cycles in order to take
* advantage of parallel execution anyway. See
* https://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
*/
public class ExecutorServicePerformanceDemo {
public static void main(String... args) {
System.out.println("Java Version " + System.getProperty("java.version"));
for (int i = 0; i < 5; i++) {
int availableProcessors =
Runtime.getRuntime().availableProcessors();
exercise("FixedThreadPool(" + availableProcessors + ")",
() -> Executors.newFixedThreadPool(availableProcessors));
exercise("WorkStealingPool(" + availableProcessors + ")",
() -> Executors.newWorkStealingPool(availableProcessors));
exercise("CachedThreadPool()",
Executors::newCachedThreadPool);
// To support older Java versions, we create the ThreadPerTaskExecutor using reflection
Optional<Class<?>> optionalBuilderClass = Stream.of(Thread.class.getClasses())
.filter(Class::isInterface)
.filter(intf -> intf.getSimpleName().equals("Builder"))
.findFirst();
optionalBuilderClass.ifPresent(builderClass ->
exercise("VirtualThreadPerTaskExecutor()",
() -> getExecutorService(builderClass, "ofVirtual")));
optionalBuilderClass.ifPresent(builderClass ->
exercise("PlatformThreadPerTaskExecutor()",
() -> getExecutorService(builderClass, "ofPlatform")));
System.out.println();
}
System.out.println();
System.out.println("Best Throughput for Java " + System.getProperty("java.version"));
for (Map.Entry<String, Long> entry : bestThroughput.entrySet()) {
System.out.printf(Locale.US, "%s:%n\t%,d%n",
entry.getKey(), entry.getValue());
}
}
private static ExecutorService getExecutorService(Class<?> builderClass, String ofBuilderMethodName) {
try {
Object virtualThreadBuilder =
Thread.class.getMethod(ofBuilderMethodName).invoke(null);
Object virtualThreadFactory =
builderClass.getMethod("factory").invoke(virtualThreadBuilder);
return (ExecutorService) Executors.class.getMethod("newThreadPerTaskExecutor",
ThreadFactory.class).invoke(null, virtualThreadFactory);
} catch (ReflectiveOperationException e) {
// should not happen
throw new AssertionError(e);
}
}
private static Map<String, Long> bestThroughput =
new LinkedHashMap<>();
private static void exercise(
String description, Supplier<? extends ExecutorService> supplier) {
ExecutorService service = supplier.get();
System.out.print(description + ": ");
try (ExecutorServiceCloser closer =
new ExecutorServiceCloser(service)) {
AtomicBoolean running = new AtomicBoolean(true);
ScheduledExecutorService timer =
Executors.newSingleThreadScheduledExecutor();
timer.schedule(() -> {
timer.shutdown();
running.set(false);
}, 2, TimeUnit.SECONDS);
LongAdder count = new LongAdder();
long submitted = 0;
while (running.get()) {
service.submit(count::increment);
submitted++;
// Throttle to allow execution to catch up
while (submitted > count.longValue() + 10_000)
LockSupport.parkNanos(100_000);
// Thread.onSpinWait(); // not available in Java 8
}
System.out.printf("%,d%n", count.longValue());
bestThroughput.merge(description, count.longValue(), Long::max);
checkDistribution(count);
}
}
private static void checkDistribution(LongAdder adder) {
try {
Field cellsField = LongAdder.class.getSuperclass().getDeclaredField("cells");
cellsField.setAccessible(true);
Object[] cells = (Object[]) cellsField.get(adder);
if (cells != null) {
System.out.println("cells.length = " + cells.length);
int num = 0;
for (Object cell : cells) {
if (cell != null) {
Field valueField = cell.getClass().getDeclaredField("value");
valueField.setAccessible(true);
long value = (long) valueField.get(cell);
System.out.printf(Locale.US, "\t% 2d %,d%n", num++, value);
}
}
System.out.println();
}
} catch (ReflectiveOperationException e) {
throw new AssertionError(e);
}
}
// for older versions of Java
private static class ExecutorServiceCloser
implements AutoCloseable {
private final ExecutorService service;
public ExecutorServiceCloser(ExecutorService service) {
this.service = service;
}
public void close() {
boolean terminated = service.isTerminated();
if (!terminated) {
service.shutdown();
boolean interrupted = false;
while (!terminated) {
try {
terminated = service.awaitTermination(
1L, TimeUnit.DAYS);
} catch (InterruptedException e) {
if (!interrupted) {
service.shutdownNow();
interrupted = true;
}
}
}
if (interrupted) Thread.currentThread().interrupt();
}
}
}
}
Summarized output for Java 8, 11-18, 21-26
Best Throughput for Java 1.8.0_472
FixedThreadPool(12): 16,539,415
WorkStealingPool(12): 12,161,482
CachedThreadPool(): 5,240,226
Best Throughput for Java 11.0.29
FixedThreadPool(12): 17,325,727
WorkStealingPool(12): 13,289,596
CachedThreadPool(): 5,689,781
Best Throughput for Java 12.0.2
FixedThreadPool(12): 18,839,490
WorkStealingPool(12): 13,553,481
CachedThreadPool(): 5,509,699
Best Throughput for Java 13.0.14
FixedThreadPool(12): 18,052,340
WorkStealingPool(12): 12,984,799
CachedThreadPool(): 5,666,061
Best Throughput for Java 14.0.2
FixedThreadPool(12): 18,508,592
WorkStealingPool(12): 13,444,555
CachedThreadPool(): 5,629,916
Best Throughput for Java 15.0.10
FixedThreadPool(12): 18,687,956
WorkStealingPool(12): 13,181,730
CachedThreadPool(): 5,579,264
Best Throughput for Java 16.0.2
FixedThreadPool(12): 19,035,898
WorkStealingPool(12): 13,389,173
CachedThreadPool(): 5,471,937
Best Throughput for Java 17.0.17
FixedThreadPool(12): 19,338,253
WorkStealingPool(12): 87,815,764
CachedThreadPool(): 4,564,468
Best Throughput for Java 18.0.2
FixedThreadPool(12): 18,111,864
WorkStealingPool(12): 89,167,812
CachedThreadPool(): 4,233,959
Best Throughput for Java 21.0.9
FixedThreadPool(12): 17,042,381
WorkStealingPool(12): 66,046,206
CachedThreadPool(): 4,573,772
VirtualThreadPerTaskExecutor(): 22,311,961
PlatformThreadPerTaskExecutor(): 123,191
Best Throughput for Java 22.0.2
FixedThreadPool(12): 17,895,714
WorkStealingPool(12): 61,158,200
CachedThreadPool(): 4,692,956
VirtualThreadPerTaskExecutor(): 24,655,311
PlatformThreadPerTaskExecutor(): 119,036
Best Throughput for Java 23.0.2
FixedThreadPool(12): 17,279,619
WorkStealingPool(12): 13,008,926
CachedThreadPool(): 4,572,345
VirtualThreadPerTaskExecutor(): 12,618,624
PlatformThreadPerTaskExecutor(): 120,829
Best Throughput for Java 24.0.2
FixedThreadPool(12): 17,982,688
WorkStealingPool(12): 13,425,840
CachedThreadPool(): 4,536,324
VirtualThreadPerTaskExecutor(): 19,040,775
PlatformThreadPerTaskExecutor(): 126,088
Best Throughput for Java 25.0.1
FixedThreadPool(12): 17,742,859
WorkStealingPool(12): 13,375,118
CachedThreadPool(): 4,487,596
VirtualThreadPerTaskExecutor(): 12,866,378
PlatformThreadPerTaskExecutor(): 125,123
Best Throughput for Java 26-ea
FixedThreadPool(12): 17,679,490
WorkStealingPool(12): 13,174,342
CachedThreadPool(): 4,751,515
VirtualThreadPerTaskExecutor(): 12,216,321
PlatformThreadPerTaskExecutor(): 126,563