Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8353716

G1: AHS work umbrella

XMLWordPrintable

    • gc

      This is an umbrella CR collecting useful information/plans on the work.

      [1] gives a rough summary ("braindump") about the system envisioned for G1. Here is a re-creation (use a fixed size font to see the ASCII art).

      On the left side there are the inputs to the heap sizing, on the right side the outputs of that heap sizing policy.


      ```
      Inputs:

      Min/Max/Initial-
          HeapSize (1)

      CPU based heap
          sizing (2) Current committed
                                                                heap size
      Min/MaxHeapFree- ----> Controller ---->
           Ratio (3) Current target
                                                                heap size
      CurrentMaxHeap-
           Size (4)

      SoftMaxHeapSize (5)

          "AHS" (6)
      ```

      I.e. react better to some inputs ("pressure" sources), given some controller, generate current target and current committed heap size, and act on these.


      (1) Existing heap limits like Min/Max/InitialHeapSize. These will be kept and observed.

      (2) G1 uses CPU usage based heap sizing. Roughly it follows GCTimeRatio that sets a goal for how much CPU the garbage collection algorithm may use, changing heap size based on that.

      Currently CPU based heap sizing interpretes GCTimeRatio as a maximum CPU usage only, i.e. only every expands the heap to keep that maximum GC cpu usage. There is need to size down the heap based on GCTimeRatio. This will be implemented as part of JDK-8238687.

      Open questions:
        * make this value managed to be able to control CPU usage (and indirectly heap usage) on the fly externally. Initial thoughts indicate that it might be a good way to steer heap usage.
        * maybe ultimately translate other input to this value.
        * there have been some thoughts to replace GCTimeRatio (target ratio of GC time to mutator time) with a more direct (GC) CPU usage percentage option (e.g. GCCPUUsagePercent) since it would be easier and more direct to use. The implementation internally also uses this value, and not the ratio.

      (3) Min/MaxHeapFreeRatio currently determine a percentage(!) of minimum and maximum free regions after concurrent mark and a full collection. This is currently the only mechanism for G1 to decrease Java heap memory.

      One problem with these flags is that their function is completely decoupled from other heap sizing. In conjunction with CPU based heap sizing they tend to undo or redo previous heap sizing decisions. The current defaults just prevent this to happen too often as they are very generous.

      Open issues:
        * Due to their propensity to undo other decisions, the current idea is to remove or limit the use of the Min/MaxHeapFreeRatio flag in G1. In a current JDK-8238687 prototype the code removes the use in the Remark pause, only Full GC observes them at this point.
        * One of their uses is CraC-like applications where the user wants to reduce heap usage to a minimum just before freezing the VM (after issuing a System.gc()). For this use case a dedicated signal to the application (jcmd) may be more appropriate.
        * Otherwise their effect can be approximated using CPU based heap sizing

      (4) CurrentMaxHeapSize introduces a current hard Java heap limit, i.e. the VM would OOME if Java heap memory usage would exceed that. The JEP draft in JDK-8204088 describes the mechanism and some use cases. If introduced, it might not be as a part of a JEP. Main use case would be to set maximum Java heap from outside the VM based on information not known (or impossible to know) by the VM. Some examples:

        * non-standard container environments where there is no configurable maximum (total) heap available
        * multiple JVMs in a single container environment, implementing some kind of priority control.

      This seem to be somewhat uncommon use cases, but the complexity is smaller. Might be something to investigate in more detail.

      (5) SoftMaxHeapSize introduces a “soft” boundary for the Java heap size (JDK-8236073); close to or beyond that live set size, GC should expend more effort to do garbage collection, but contrary to exceeding a maximum heap size exceeding that amount of Java heap memory use will not abort the VM.

      Guidance about the impact can be obtained from the existing implementation for ZGC: it will at that time use up to 25% of available CPU for garbage collection; another idea is to run concurrent marking without pause at that point.

      Use cases:
        * performance tuning: in some cases internal heuristics can not know when to spend maximum effort for garbage collection to avoid full gcs/evacuation failure/allocation stalls. One can delay that “breaking point” somewhat by manual tuning
        * there is a fair overlap between this and the CPU based heap sizing; these need to be reconciled. The gain of allowing specification of (soft) heap goal directly is exactly that: cpu based heap sizing may be too indirect vs. specifying a soft (target) heap size directly.

      (6) "AHS": Similar to ZGC's efforts to be a good citizen within a given environment, this term includes all additional behaviour to probe current available memory and adjusting current/soft maximum size (https://openjdk.org/jeps/8329758) according to that.

      From that document, the following functionality seems to be missing or needs to be improved for ZGC-style “AHS”.
        * detect and react on neighbouring processes' native memory pressure
        * a `ZGCPressure` or similar knob that changes extent to react on native memory pressure (or in general, any "pressure”). Note that for better or worse, this “Pressure” is determined fairly abstractly, and so its impact is hard to trace, understand, and work with by the end user.
        * startup heap size expansion boost. G1 already has this functionality, but may need to be revised given the other changes introduced in this effort.


      There are a few ideas that have been proposed in internal and external discussions that might improve VM behaviour and are somewhat related:
        * concurrent heap (un-)commit - that means, regularly (in a more periodic sense than GCs which can be spaced quite far apart, and fairly irregular, based on load) revise heap targets, may help with general stability/consistency/timeliness of the results.
        * above leaves open the question of timeliness and extent of the reaction on any kind of pressure in the general case (while startup boost is covered).
        * regular periodic cleanup of non-Java heap data structures (JDK-8213198)

      All of these and more issues are collected using the `gc-g1-heap-resizing` label (https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing).

      References:
      [1] https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051079.html
      [2] https://bugs.openjdk.org/secure/attachment/114105/ahs-google-revised-20241022.pdf

            Unassigned Unassigned
            tschatzl Thomas Schatzl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: