Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8162929

Enqueuing dirty cards into a single DCQS during GC does not scale

XMLWordPrintable

    • gc
    • b07

      While looking at some more demanding large microbenchmarks (e.g. BigRamtester, 20g heap, 1M regions) enqueueing dirty cards during GC in G1ParScanThreadState::update_rs incurs a significant amount of wait (idle) time.

      The reason is that enqueuing completed buffers takes a global lock (basically ending up in PtrQueue::handle_zero_index() and PtrQueueSet::enqueue_complete_buffer(); there is also some strange locking/unlocking going on in PtrQueue::locking_enqueue_completed_buffer()).

      That does not scale beyond a few threads.

      The problem is harder than it seems because after providing a per-thread DCQS, performance does not improve a lot. The stalling is moved to the malloc() calls done when allocating new DCQ buffers.

            kbarrett Kim Barrett
            tschatzl Thomas Schatzl
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: