Loading...

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 22
Affects Version/s: 17, 21, 22
Component/s: hotspot
Labels:

Subcomponent:
runtime
Resolved In Build:
b26

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8328182	21.0.4-oracle	Ludvig Janiuk	P4	Resolved	Fixed	b01
JDK-8328677	21.0.4	Aleksey Shipilev	P4	Resolved	Fixed	b01
JDK-8328354	17.0.12-oracle	Ludvig Janiuk	P4	Resolved	Fixed	b01
JDK-8330692	17.0.12	Aleksey Shipilev	P4	Resolved	Fixed	b01

We have seen one of the cases in our production, where we have TTSP hiccups. Profiling shows the hiccup is in monitor deflation thread, which is busy unlinking the monitors. It is batched with `MonitorDeflationMax` (default: 1M) and checks safepoint after each batch.

However, the current default for the batch is unfortunately very high. If we spend just a 1ns per unlinking monitor, 1M batch size means we would not respond to safepoint request for about 1ms. In practice, it seems the delays are up to 30 ms. This gives a major latency hiccup.

Here is the simplest reproducer I have:

```
import java.util.ArrayList;
import java.util.List;

public class MonitorChurn {
   static final int THREAD_COUNT = Integer.getInteger("threads", 10);
   static final int MONITORS_PER_THREAD = Integer.getInteger("monitors", 1_000_000);

   static class HolderThread extends Thread {
     Object[] objs = new Object[MONITORS_PER_THREAD];

     public void run() {
       try {
         for (int c = 0; c < objs.length; c++) {
            objs[c] = new Object();
         }
         while (true) {
           for (int c = 0; c < objs.length; c++) {
             synchronized (objs[c]) {}
           }
         }
       } catch (Exception e) {
         e.printStackTrace();
       }
     }
   }

   static Object sink;

   public static void main(String... args) throws Exception {
     for (int c = 0; c < THREAD_COUNT; c++) {
       Thread t = new HolderThread();
       t.start();
     }
     while (true) {
       sink = new byte[1000000];
       Thread.sleep(1);
     }
   }
}
```

I added two logging statements in deflation/unlinking code to see when we are actually hitting the deflation/unlinking paths.

```
diff --git a/src/hotspot/share/runtime/synchronizer.cpp b/src/hotspot/share/runtime/synchronizer.cpp
index c83df922e0d..a5a16c56968 100644
--- a/src/hotspot/share/runtime/synchronizer.cpp
+++ b/src/hotspot/share/runtime/synchronizer.cpp
@@ -132,6 +132,7 @@ size_t MonitorList::unlink_deflated(Thread* current, LogStream* ls,
   ObjectMonitor* prev = nullptr;
   ObjectMonitor* head = Atomic::load_acquire(&_head);
   ObjectMonitor* m = head;
+ log_warning(monitorinflation)("IN UNLINK PATH: " SIZE_FORMAT " MONITORS TO UNLINK", count());
   // The in-use list head can be null during the final audit.
   while (m != nullptr) {
     if (m->is_being_async_deflated()) {
@@ -1688,6 +1689,8 @@ size_t ObjectSynchronizer::deflate_idle_monitors(ObjectMonitorsHashtable* table)
   size_t unlinked_count = 0;
   size_t deleted_count = 0;
   if (deflated_count > 0 || is_final_audit()) {
+ log_warning(monitorinflation)("IN DEFLATION PATH: " SIZE_FORMAT " MONITORS DEFLATED", deflated_count);
+
     // There are ObjectMonitors that have been deflated or this is the
     // final audit and all the remaining ObjectMonitors have been
     // deflated, BUT the MonitorDeflationThread blocked for the final

```

The result is like this:

```
$ /java -XX:+UnlockDiagnosticVMOptions -Xmx256m -XX:GuaranteedAsyncDeflationInterval=1000 -XX:+UseParallelGC -XX:LockingMode=0 -XX:MonitorDeflationMax=1000000 -Xlog:safepoint+stats MonitorChurn.java

[10.589s][info ][safepoint,stats ] VM Operation [ threads: total initial_running ][ time: sync cleanup vmop total ] page_trap_count
[10.589s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 62375 2541 62300334 62365250 ] 0
[10.705s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 42875 1375 62180458 62224708 ] 0
[10.756s][warning][monitorinflation] IN DEFLATION PATH: 1000000 MONITORS DEFLATED
[10.757s][warning][monitorinflation] IN UNLINK PATH: 2399912 MONITORS TO UNLINK
[10.834s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 11 ][ 12407500 7208 63076667 75491375 ] 1
[10.953s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 11 ][ 59208 2042 65406500 65467750 ] 0
[11.067s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 26292 1917 61031375 61059584 ] 0
[11.182s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 40334 1708 62094458 62136500 ] 0
[11.294s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 43083 2167 58604166 58649416 ] 0
[11.404s][info ][safepoint,stats ] ParallelGCFailedAllocation [ 20 10 ][ 43083 2542 56941542 56987167 ] 0
[
```

Near the 2.4M monitor unlinking path we have incurred 12.4 ms TTSP latency.

Naively, one can tune down `MonitorDeflationMax`, but it has impact on the number of monitors we actually deflate per one monitor deflation phase, which unfortunately creeps up monitor population, as deflater thread is less active.

We need to take a deeper look into safepoint checking policy during unlinkage. It is interesting that deflation loop actually checks for pending safepoint on every monitor. It is not immediately clear why unliking requires batching. If it does require batching for good reason, we might want to split the thresholds into `MonitorDeflationMax` (which would continue to drive deflation batch) and `MonitorUnlinkMax` (which would drive the batching for unlinkage).

backported by

JDK-8328182 Monitor deflation unlink phase prolongs time to safepoint

Resolved

JDK-8328354 Monitor deflation unlink phase prolongs time to safepoint

Resolved

JDK-8328677 Monitor deflation unlink phase prolongs time to safepoint

Resolved

JDK-8330692 Monitor deflation unlink phase prolongs time to safepoint

Resolved

is blocked by

JDK-8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls

Closed

relates to

JDK-8325437 Safepoint polling in monitor deflation can cause massive logs

Resolved

links to

Commit openjdk/jdk17u-dev/5b656c86

Commit openjdk/jdk21u-dev/cc918abe

Commit openjdk/jdk/efc39225

Review openjdk/jdk17u-dev/2309

Review openjdk/jdk21u-dev/369

Review openjdk/jdk/16412

(1 relates to, 6 links to)

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates