Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8255289

ZGC on 15 reports higher GC time on `GarbageCollectorMXBean.getCollectionTime()`

XMLWordPrintable

    • x86_64
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      CentOS 7, OpenJDK 14.0.x vs OpenJDK 15.0.0 using ZGC

      A DESCRIPTION OF THE PROBLEM :
      Switching from 14.0.2 to 15.0.0 reports significantly (and unexpectedly) different GC times.
      We are using Prometheus to report metrics and Grafana to render them.
      Prometheus calls `GarbageCollectorMXBean.getCollectionTime()`.
      Grafana renders using: `rate(jvm_gc_collection_seconds_sum{job="$app", env="$env"}[2m])`

      Several applications on 14.0.2 with flags: `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:InitialRAMPercentage=70 -XX:MaxRAMPercentage=70 -XX:MinRAMPercentage=70` report GC times less than 100 microseconds.

      Same applications on 15.0.0 with flags: `-XX:+UseContainerSupport -XX:+UseZGC` report GC times around 20 milliseconds.

      Graphs for GC rates, memory usage looks the same, active thread count in a bit higher on 15.0.0 (82..84 on 14.0.2 vs 86..91 on 15.0.0).

      I experimented on one of services with switching just OpenJDK from 14.0.2 to OpenJDK 15.0.0 and I tried different extra flags for 15 - GC timings stay much higher than on 14.

      I cannot prove it, but as other graphs stay the same, I have a feeling, that ZGC might report GC times in microseconds, instead of expected milliseconds OR in 15th it started to report ALL time spend for GC, instead of just pause times. I tried to read detailed gc.log, but I failed to understand, how to account reported timings in gc.log vs time reported by calling `GarbageCollectorMXBean.getCollectionTime()`.

      REGRESSION : Last worked in version 14.0.2

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Use same web application on 14.0.x and on 15.0.0
      Compare output of `GarbageCollectorMXBean.getCollectionTime()`

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      As ZGC become production ready, I expected to see same or lower GC times.
      ACTUAL -
      ZGC reports about 250..500 times higher GC times, while CPU usage stays very similar.

      CUSTOMER SUBMITTED WORKAROUND :
      For now: downgrade back to 14

      FREQUENCY : always


            Unassigned Unassigned
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: