Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8282836

Shenandoah Generational: Improve Adaptive Shenandoah Heuristics: Dynamic number of concurrent GC threads



    • gc
    • generic
    • generic


          1. Remember the total amount of CPU time required to complete previous GC passes. Use historical data to predict the total CPU time required for next GC pass. Correlate with trends in live memory growth or shrinkage.
          2. When should_start_gc() is queried, it asks how many threads are required to complete GC before the anticipated exhaustion of allocation pool. If there is sufficient slack to delay the start of GC with even a single concurrent GC thread, then we delay. Otherwise, we start GC now, dedicating the smallest number of concurrent GC threads that can reliably complete the GC effort within the desired time.
          3. Starting GC sooner with a smaller number of concurrent GC threads is generally preferable in that this results in less disruption of the service workload.
          4. Under duress, it is more likely that should_start_gc() is invoked well after the moment in time when GC should have been started. In this case, we start concurrent gc with “all available cores” dedicated to GC. This may have the effect of stalling all service threads until GC completes. But this represents a simpler control structure than degeneration or full GC efforts. It is essentially a “very high priority” concurrent GC.
          5. Note that the planned generational implementation of ShenandoahPacer is oblivious to the number of cores dedicated to mutation or GC. It simply endeavors to assure that allocations proceed at a pace that is consistent with the current pace of GC. It does so by stalling mutator threads that need to allocate while allowing mutator threads that do not need to allocate to continue running unobstructed. If we properly balance the numbers of cores dedicated to GC and mutation, pacing will run more efficiently. There will be far fewer OS calls to query times, to put particular threads to sleep, and to awaken threads after the required sleep duration has been reached. There will also be fewer CPU cycles dedicated to retrying allocations after sleeping, and less contention on volatile shared variables that are used to coordinate pacing between GC and allocation threads.

      Motivation for these improvements:

      1. Even though configured to poll should_start_gc() every 10 ms, the poll may happen less frequently if the system is heavily loaded. We observed, for example, that the first poll following completion of GC occurred 17 ms after GC finished.
      2. Once should_start_gc() returns true, the GC does not actually start until the requested safepoint is reached. In one case, this was observed to occur 211 ms after the safepoint was requested.
      3. At a high allocation rate of 8 GBytes per second, these delays represent allocation of roughly 2 GBytes. Worse yet, the allocation rate during this brief intermission between GC passes is often much higher than the “average” allocation rate for the service. This is because the many allocation requests that were stalled during the previous GC pass will now run unimpeded.
      4. Whenever the heuristics fail to arrange that GC completes its efforts before the allocation pool has exhausted available memory, we experience long stop-the-world pauses associated with degenerated or full GC.




            Unassigned Unassigned
            kdnilsen Kelvin Nilsen
            0 Vote for this issue
            1 Start watching this issue