The resulting activation protocol is somewhat complicated because it needs to deal with two fairly different clients. One is the dirty card buffer handling from the write barrier, which may need to wake up the "primary" refinement thread. The other is wakeup of additional refinement threads.
Both cases could be simplified by splitting the refinement thread class into two subclasses, one for the primary thread and one for the other threads. The other threads could go back to using monitor-based activation control. Only the primary thread needs to use atomic state and semaphores, and even there the protocol can be made simpler.