During parallel scavenges, PSStealTasks are enqueued as the last items to run
in order to help with load balancing.
The way they current work is this; for N worker threads, N-1 steal tasks are enqueued.
The Nth task is a steal task terminator.
The normal steal task sits in a while loop, attempting to steal work from another
thread. If it fails to steal work, it checks the termination condition. If termination
is allowed, the task exits.
The steal task terminator simply sets the termination condition to true and exits.
The basic idea is that all threads keep trying to steal as long as any thread
is doing something other than stealing. Because the N-1 threads cannot exit until
the Nth thread picks up the steal task terminator, this also acts as a barrier.
To date, this has worked well with the parallel scavenge code.
However, during testing of parallel mark sweep code, the steal tasks were exiting
too quickly.
What was happening is this; the parallel mark sweep code had a relatively small
amount of work tracing roots, the remainder of the work was chasing down the
pointers found during root traversal. So quite quickly, all the threads were
running in steal tasks.
Each thread keeps two queues of work to do. One queue may be stolen from, and
is roughly ~1000 entries in size. Any work in excess of that is stored on the
overflow stack. When a thread is working, it always removes work from the
overflow stack first, this allows other threads to get maximum benefit by
stealing from the other queue.
During the early exit case, a single thread would very temporarily have
no work available on its stealable queue, but it did still have work on its
overflow stack. Another thread would attempt to steal work, and fail, because
the stealable queue was empty. The other thread would note that the termination
condition was true, it had failed a steal attempt, and it would exit.
This is particularly problematic on small machines, where the early loss of
a thread has a large impact on overall completion time.
We need a smarter termination condition. The steal task terminator should not
simply set a condition and exit, it needs to share the load.
A quick and dirty fix was to have the steal task terminator set the condition
and then start stealing. Exiting required failing 10 steal attempts. This
did fix the short term problem, but is probably still too simple.
in order to help with load balancing.
The way they current work is this; for N worker threads, N-1 steal tasks are enqueued.
The Nth task is a steal task terminator.
The normal steal task sits in a while loop, attempting to steal work from another
thread. If it fails to steal work, it checks the termination condition. If termination
is allowed, the task exits.
The steal task terminator simply sets the termination condition to true and exits.
The basic idea is that all threads keep trying to steal as long as any thread
is doing something other than stealing. Because the N-1 threads cannot exit until
the Nth thread picks up the steal task terminator, this also acts as a barrier.
To date, this has worked well with the parallel scavenge code.
However, during testing of parallel mark sweep code, the steal tasks were exiting
too quickly.
What was happening is this; the parallel mark sweep code had a relatively small
amount of work tracing roots, the remainder of the work was chasing down the
pointers found during root traversal. So quite quickly, all the threads were
running in steal tasks.
Each thread keeps two queues of work to do. One queue may be stolen from, and
is roughly ~1000 entries in size. Any work in excess of that is stored on the
overflow stack. When a thread is working, it always removes work from the
overflow stack first, this allows other threads to get maximum benefit by
stealing from the other queue.
During the early exit case, a single thread would very temporarily have
no work available on its stealable queue, but it did still have work on its
overflow stack. Another thread would attempt to steal work, and fail, because
the stealable queue was empty. The other thread would note that the termination
condition was true, it had failed a steal attempt, and it would exit.
This is particularly problematic on small machines, where the early loss of
a thread has a large impact on overall completion time.
We need a smarter termination condition. The steal task terminator should not
simply set a condition and exit, it needs to share the load.
A quick and dirty fix was to have the steal task terminator set the condition
and then start stealing. Exiting required failing 10 steal attempts. This
did fix the short term problem, but is probably still too simple.