Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4996696

PSStealTasks need smarter termination code

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • 6
    • 5.0
    • hotspot
    • None
    • gc
    • sparc
    • solaris_8

      During parallel scavenges, PSStealTasks are enqueued as the last items to run
      in order to help with load balancing.

      The way they current work is this; for N worker threads, N-1 steal tasks are enqueued.
      The Nth task is a steal task terminator.

      The normal steal task sits in a while loop, attempting to steal work from another
      thread. If it fails to steal work, it checks the termination condition. If termination
      is allowed, the task exits.

      The steal task terminator simply sets the termination condition to true and exits.

      The basic idea is that all threads keep trying to steal as long as any thread
      is doing something other than stealing. Because the N-1 threads cannot exit until
      the Nth thread picks up the steal task terminator, this also acts as a barrier.

      To date, this has worked well with the parallel scavenge code.

      However, during testing of parallel mark sweep code, the steal tasks were exiting
      too quickly.

      What was happening is this; the parallel mark sweep code had a relatively small
      amount of work tracing roots, the remainder of the work was chasing down the
      pointers found during root traversal. So quite quickly, all the threads were
      running in steal tasks.

      Each thread keeps two queues of work to do. One queue may be stolen from, and
      is roughly ~1000 entries in size. Any work in excess of that is stored on the
      overflow stack. When a thread is working, it always removes work from the
      overflow stack first, this allows other threads to get maximum benefit by
      stealing from the other queue.

      During the early exit case, a single thread would very temporarily have
      no work available on its stealable queue, but it did still have work on its
      overflow stack. Another thread would attempt to steal work, and fail, because
      the stealable queue was empty. The other thread would note that the termination
      condition was true, it had failed a steal attempt, and it would exit.

      This is particularly problematic on small machines, where the early loss of
      a thread has a large impact on overall completion time.

      We need a smarter termination condition. The steal task terminator should not
      simply set a condition and exit, it needs to share the load.

      A quick and dirty fix was to have the steal task terminator set the condition
      and then start stealing. Exiting required failing 10 steal attempts. This
      did fix the short term problem, but is probably still too simple.

            pbk Peter Kessler
            jmcilreesunw James Mcilree (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: