Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8305994

Guarantee eventual async monitor deflation

    XMLWordPrintable

Details

    • b20

    Backports

      Description

        One of our systems reported a steady increase in memory usage after migration from JDK 11 to JDK 17. NMT logs clearly show the growing "Object Monitors" section, where the population of monitors is nearly 11M, taking several GBs of RSS.

        Async monitor deflation is supposed to deal with this, by triggering the cleanup when `MonitorUsedDeflationThreshold` (`MUDT`) is reached. But the apparent problem with that heuristics is that `MUDT` is the percent of "ceiling", which is derived roughly as `max(#threads*AvgMonitorsPerThreadEstimate, #max_monitors_ever_observed)`, plus additive adjustments when deflation does not make progress. For the systems that run thousands of threads, we can have a very high ceiling.

        Also, AFAIU, the ceiling can get arbitrarily high, if we had historical spike in the number of monitors, or we did some past async deflations without a progress. The ceiling seems to never go down! (Which is a good thing in buildings, but not in this heuristics code.) So even if we set `MUDT` to lowest value, 1, the ceiling might get so large eventually, the heuristics would never fire after some point.

        Back-envelope calculation: even without involving the historical ceiling adjustments, just the static calculation for the system with 13K threads (real-life number) and default `AMPTE` = 1024, yields the ceiling of about 12M. Which means the default `MUDT` = 90 would not trigger cleanup until we have at least 11M monitors, which at ~200 bytes per monitor translates to >2 GB of native memory.

        This started to be a problem in JDK 17, because the work done in JDK 15..16 (JDK-8153224, JDK-8246476) gradually removed the path that did the monitor deflation on safepoint cleanups. So the JDK 11 applications got their monitors cleanup with the eventual safepoints from e.g. GC, *and* from the special cleanup safepoints triggered by monitor used thresholds, checked every `GuaranteedSafepointInterval`. But for JDK 17, the deflation is now only triggered by monitor used threshold checked every `AsyncDeflationInterval`, and "used threshold" might not be reached for quite some time. The worst case would be threads spiking to use all these monitors, then never using them again, and never using new ones, so the used threshold is never reached, and monitors stay inflated forever.

        I have a minimal example showing this behavior:

        ```
        import java.util.concurrent.CountDownLatch;

        public class Monitors {
           static final int THREAD_COUNT = Integer.getInteger("threads", 2000);
           static final int MONITOR_COUNT = Integer.getInteger("monitorsPerThread", 800);

           static final CountDownLatch STARTED = new CountDownLatch(THREAD_COUNT);
           static final CountDownLatch LOCKED = new CountDownLatch(THREAD_COUNT);
           static final CountDownLatch HANG = new CountDownLatch(1);

           public static void main(String... args) throws Exception {
             System.out.println("Initializing");

             for (int c = 0; c < THREAD_COUNT; c++) {
               Thread t = new Thread(() -> {
                 try {
                   STARTED.countDown();
                   STARTED.await();
                 } catch (InterruptedException e) {}

                 for (int l = 0; l < MONITOR_COUNT; l++) {
                   try {
                     Object o = new Object();
                     synchronized (o) {
                       o.wait(1);
                     }
                   } catch (InterruptedException e) {}
                 }

                 try {
                   LOCKED.countDown();
                   HANG.await();
                 } catch (InterruptedException e) {}
               });
               t.start();
             }

             STARTED.await();
             System.out.println("Started");
             LOCKED.await();
             System.out.println("Locked");
             System.in.read();
             HANG.countDown();
           }

        }
        ```

        Run with:

        ```
        $ java -XX:NativeMemoryTracking=summary -Xss256k Monitors.java
        Initializing
        Started
        Locked

        <in another terminal>
        $ ps x -o pid,rss,command | grep java
        67999 704656 .../java -XX:NativeMemoryTracking=summary -Xss256k Monitors.java

        $ jcmd 67999 VM.native_memory
        ...
        - Object Monitors (reserved=325001KB, committed=325001KB)
                                    (malloc=325001KB #1600007)
        ```

        So, out of 704M of RSS, 325M is taken by inflated object monitors, and there is 1.6M of them (2000 threads, 800 monitors each).

        I see these ways out of this:

        0. Ask users who know about this problem to drop their `MUDT` to much lower value, so that deflation would be triggered more often. This mitigates the issue, but this does not change the default behavior, which means other users are still exposed to this problem.

        1. Drop `MUDT` to much lower default value, so that cleanups are more frequent. I think this is safe to do from latency perspective, because the deflation would still be performed asynchronously. The problem with the arbitrarily high ceiling is still present. This might also have throughput regressions since deflater thread would be more active in normal conditions.

        2. Drop `AMPTE` to much lower default value, so that monitor population ceiling is not that large. This have the same implications as lowering `MUDT`.

        3. Amend VM to request async deflation from safepoint, for example calling a light-weight version of `ObjectSynchronizer::request_deflate_idle_monitors` from safepoint cleanup path. This would be similar to the behavior of the JDK 11 -- piggybacking the cleanup requests on safepoint invocations, but with the benefit of being completely asynchronous.

        4. Introduce the additional `GuaranteedAsyncDeflationInterval`, which would normally be larger than `AsyncDeflationInterval`, but which would trigger the deflation even when threshold is not reached. Some large default value, like 60s, should serve long-running systems well without incurring significant work on the deflater thread.

        I like (4) quite a bit better, because it acts like a safety rail should the normal heuristics fail.
        The threshold heuristics fixes can then proceed at leisurely pace, all the while being covered by this safety net.

        Attachments

          Issue Links

            Activity

              People

                shade Aleksey Shipilev
                shade Aleksey Shipilev
                Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: