Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8341787

ParallelGC: Optimize memory allocation to reduce contention on Heap_lock

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • None
    • hotspot
    • gc

      With a simple benchmark code https://github.com/pengxiaolong/benchmarks/blob/main/multi-thread-allocation-latency/src/main/java/personal/xlpeng/benchmarks/allocationlatency/MultiThreadAllocationLatency.java

      We noticed that as we increase the number of Java Threads, the long tail latency can >10ms even 100ms, further instrumentation on the code of memory allocation path(https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:allocation_instrumentation) shows acquiring Heap_lock contributes quite a lot to long tail latency.

      In the code of ParallelGC memory allocation path, mutator always take the Heap_lock and try to allocate if fast path fails, so likely when heap is out of space, most of Java threads trying to allocate memory on heap are likely blocked at acquiring Heap_lock, this causes heavy contention on Heap_lock right after Garbage collection finishes.

      Instead of always taking Heap_lock in the slow path, we cloud introduce a barrier to let Java threads to wait at barrier when there is VM_CollectForAllocation triggered, and only take Heap_lock where is there is need(e.g. expand and allocate on old gen).

      A crude version shows pretty good improvement for the long tail latency, https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:heap_lock_contention , some refinements will be needed to prepare the PR for the change.

            xpeng Xiaolong Peng
            xpeng Xiaolong Peng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: