-
Enhancement
-
Resolution: Fixed
-
P4
-
17, 21, 24
-
b27
In some of our services, we have noticed an oddity in GC times. This oddity manifests as a large time spent in GC workers, attempting to terminate. We eventually nailed this as having a large linked list that GC can only traverse in a single GC worker, with all other workers (futilely) try to terminate.
This particular problem was in our own code that maintained a linked list of phantom references. We have solved it by sharding the linked list and thus allowing GC to parallelize work on it. However, the JDK itself has a few places where the similar trap is set up, notably in Cleaners:
https://github.com/openjdk/jdk/blob/6811a11e278118b8b2781f1eaf45d363a3d2db49/src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java#L54
Cleaners and their phantom-wrapped referents are always registered on the linked-list queues. Those queues can get very large if there are lots of Cleaner-bearing objects. A simple reproducer shows how this leads to non-parallelizable GCs, which have lots of time spent in termination attempts:
% shipilev-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xlog:gc -Xlog:gc+phases=debug -Xmx1g CleanerGC.java 2>&1 | grep -E "(Termination|Pause)"
...
[16.360s][debug][gc,phases] GC(62) Termination (ms): Min: 66.95, Avg: 68.45, Max: 70.41, Diff: 3.46, Sum: 1642.74, Workers: 24
[16.360s][debug][gc,phases] GC(62) Termination Attempts: Min: 1080, Avg: 1137.4, Max: 1185, Diff: 105, Sum: 27298, Workers: 24
[16.361s][info ][gc ] GC(62) Pause Young (Normal) (G1 Evacuation Pause) 617M->22M(1024M) 88.606ms
...
There is a similar problem in DirectByteBuffers that use their own, separate "Cleaner". That would be addressed by JDK-8344332 after this enhancement lands.
This particular problem was in our own code that maintained a linked list of phantom references. We have solved it by sharding the linked list and thus allowing GC to parallelize work on it. However, the JDK itself has a few places where the similar trap is set up, notably in Cleaners:
https://github.com/openjdk/jdk/blob/6811a11e278118b8b2781f1eaf45d363a3d2db49/src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java#L54
Cleaners and their phantom-wrapped referents are always registered on the linked-list queues. Those queues can get very large if there are lots of Cleaner-bearing objects. A simple reproducer shows how this leads to non-parallelizable GCs, which have lots of time spent in termination attempts:
% shipilev-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xlog:gc -Xlog:gc+phases=debug -Xmx1g CleanerGC.java 2>&1 | grep -E "(Termination|Pause)"
...
[16.360s][debug][gc,phases] GC(62) Termination (ms): Min: 66.95, Avg: 68.45, Max: 70.41, Diff: 3.46, Sum: 1642.74, Workers: 24
[16.360s][debug][gc,phases] GC(62) Termination Attempts: Min: 1080, Avg: 1137.4, Max: 1185, Diff: 105, Sum: 27298, Workers: 24
[16.361s][info ][gc ] GC(62) Pause Young (Normal) (G1 Evacuation Pause) 617M->22M(1024M) 88.606ms
...
There is a similar problem in DirectByteBuffers that use their own, separate "Cleaner". That would be addressed by JDK-8344332 after this enhancement lands.
- blocks
-
JDK-8344332 (bf) Migrate DirectByteBuffer to use java.lang.ref.Cleaner
- Open
- relates to
-
JDK-8344332 (bf) Migrate DirectByteBuffer to use java.lang.ref.Cleaner
- Open
-
JDK-8304698 (ref) Lock contention in PhantomCleanable
- Open
- links to
-
Commit(master) openjdk/jdk/4000e923
-
Review(master) openjdk/jdk/22043