Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8162929

Enqueuing dirty cards into a single DCQS during GC does not scale

    XMLWordPrintable

Details

    • gc
    • b07

    Description

      While looking at some more demanding large microbenchmarks (e.g. BigRamtester, 20g heap, 1M regions) enqueueing dirty cards during GC in G1ParScanThreadState::update_rs incurs a significant amount of wait (idle) time.

      The reason is that enqueuing completed buffers takes a global lock (basically ending up in PtrQueue::handle_zero_index() and PtrQueueSet::enqueue_complete_buffer(); there is also some strange locking/unlocking going on in PtrQueue::locking_enqueue_completed_buffer()).

      That does not scale beyond a few threads.

      The problem is harder than it seems because after providing a per-thread DCQS, performance does not improve a lot. The stalling is moved to the malloc() calls done when allocating new DCQ buffers.

      Attachments

        Issue Links

          Activity

            People

              kbarrett Kim Barrett
              tschatzl Thomas Schatzl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: