Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8332083

Shenandoah: Reduce contention on Global Heap Lock

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • None
    • hotspot
    • gc

      Shenandoah has a global HeapLock that must be held by any thread that allocates from the heap. For most deployed workloads, this global heap lock has not been a performance bottleneck. However, we have recently encountered a service for which this manifests as a significant latency issue.

      In particular, the service has over 1,000 threads running on 32 cores. Consistently, the storm of collisions on this lock that occur following safepoints that retire all TLABS add approximately 1 ms of coordination observed during disarm_safepoint of ShenandoahFinalMarkStartEvac and ShenandoahInitUpdateRefs. Rarely (1 out of 500 GC cycles), the extra time consumed by disarm_safepoint is as high as 5.5 ms.

      We have also observed that this lock contention occasionally (also very rare: about 1 out of 500 GC cycles) causes delays of up to 6.5 ms in time to reach safepoint.

      When we monitor similar metrics on G1 GC, we see that all times to safepoint and all times to disarm_safepoint are below 1 ms.

      Here's a suggested approach for reducing contention on this global heap lock:

      1. When we rebuild the free set, choose N (8?) regions from the Mutator freeset and N regions from the Collector free set (and for GenShen, N regions from the OldCollector free set), and place these regions outside the heap lock.

      2. When a mutator or GC worker thread needs to allocate from within a regular region, proceed as follows:

          a. Randomly select a number R between 0 and N-1.
          b. Use CAS to lock region R (which is outside the heap lock) from the appropriate "free set partition". If the lock fails, try the next region.
          c. If region R's available memory is below the min plab size, grab the global heap lock (nested lock), retire this region and get a new region for index R. Then release the global heap lock.
          e. Take your allocation from Region R and unlock the region.
          d. If region R does not have sufficient memory to satisfy need, "increment modulo N" R and try the next region outside the heap lock. Release lock for old R and grab lock for new R.
          e. If we've tried all the external heap regions and failed to satisfy the allocation request, grab the global heap lock and try to allocate globally. While holding the global heap lock, consider "refreshing" region R by giving this region back to the global FreeSet and getting a new region R which presumably has a greater abundance of available memory. (Other N-1 regions might also deserve to be refreshed, but we'll let other threads do that if/when that is necessary. We'll do region R because we already hold the lock for region R.)

      3. Humongous allocations always take the global heap lock.

      4. Some "bookkeeping" is necessary to allow consolidation of allocation details when regions are moved out of and into the global free set. (Used, Allocated, Waste, etc...). When a region is removed from the global free set, we might assume the entire region is "consumed" insofar as global free set is concerned. When the removed region is returned into the global free set, the information accumulated for this region while the region resided outside the global free set is merged into the global free set representation.

      5. At various GC safepoints, we will probably want to automatically pull all of the regions that have been exported from the global free set and their relevant bookkeeping details back into the global free set before making memory budgeting decisions, allocation rate computations, etc.

      Test and tune this extensively before integration.

      (This proposed approach endeavors to have minimal impact on existing architecture of the ShenandoahFreeSet and the global heap lock. A more radical redesign and rearchitecture may be necessary. If we pursue that, we should consider integrating improved support for NUMA hardware at the same time.)

            Unassigned Unassigned
            kdnilsen Kelvin Nilsen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: