Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8072439

Further refinement of the fix JDK-8047720 - Xprof hangs on Solaris

XMLWordPrintable

    • b55
    • generic
    • generic

        The fix for the following bug may need more work:

        JDK-8047720 Xprof hangs on Solaris

        I've seen two separate e-mails about the fix for 8047720 in less than a week.

        First was this e-mail sent directly to me:

        On 1/28/15 12:28 PM, Alexander Garthwaite wrote:
        > Subject: question about fix for PeriodicTask_lock/WatcherThread deadlock
        > Dan,
        > Apologies for writing directly. I am backporting your fix to a version
        > 7 copy of the OpenJDK used at Twitter. The fix (which looks good overall)
        > is to try the lock and only wake the WatcherThread if we get it. It appears
        > there is still a small race (which may not matter) if we have the following
        > sequence:
        > * the WatcherThread has the lock
        > * some other thread, T, calls stop() and fails to get the lock
        > * the WatcherThread unlocks the PeriodicTask_lock, still sees
        > _should_stop as false and parks
        > * T doesn't have the lock and so exits stop without doing an unpack
        > This would allow the WatcherThread to park without seeing that it should stop.
        > I was wondering if the following small change would fix that:
        > void WatcherThread::stop() {
        > + // Get the PeriodicTask_lock if we can. If we cannot, then the
        > + // WatcherThread is using it and we don't want to block on that lock
        > + // here because that might cause a safepoint deadlock depending on
        > + // what the current WatcherThread tasks are doing.
        > + bool have_lock = PeriodicTask_lock->try_lock();
        > +
        > + _should_terminate = true;
        > + OrderAccess::fence(); // ensure WatcherThread sees update in main loop
        > // Retry the lock if we don't have it. This handles the case where
        > // the WatcherThread stopped using and parked without seeing
        > // _should_terminate set to true.
        > if (!have_lock) {
        > have_lock = PeriodicTask_lock->try_lock();
        > }
        > + if (have_lock) {
        > WatcherThread* watcher = watcher_thread();
        > + if (watcher != NULL) {
        > + // If we managed to get the lock, then we should unpark the
        > + // WatcherThread so that it can see we want it to stop.
        > watcher->unpark();
        > + }
        > +
        > + PeriodicTask_lock->unlock();
        > }
        > (Ignore the leading '+'s. They're marking your patch.) Thanks!
        > -- Alex


        Second was this e-mail sent on 2015.02.03 to serviceability-dev@openjdk.java.net:

        On 2/3/15 12:00 PM, Carsten Varming wrote:
        > Subject: Patch: Clean up fix for JDK-8047720
        > Greetings all,
        > I was recently introduced to the deadlock described in JDK-8047720 and fixed in JDK9. The fix seems a little messy to me, and it looks like it left some very short races in the code. So I decided to clean up the code. See attached diff.
        > The change takes a step back and acquires PeriodicTask_lock in WatcherThread::stop (like before the fix in JDK9), but this time safepoints are allowed to proceed when acquiring PeriodicTask_lock, preventing the deadlock.
        > Let me know what you think,
        > Carsten

        Carsten's patch attachment:

        diff -r 8ff882030755 src/share/vm/runtime/thread.cpp
        --- a/src/share/vm/runtime/thread.cpp Wed Jan 28 16:45:35 2015 -0800
        +++ b/src/share/vm/runtime/thread.cpp Tue Feb 03 13:52:32 2015 -0500
        @@ -1318,24 +1318,20 @@
         }
         
         void WatcherThread::stop() {
        - // Get the PeriodicTask_lock if we can. If we cannot, then the
        - // WatcherThread is using it and we don't want to block on that lock
        - // here because that might cause a safepoint deadlock depending on
        - // what the current WatcherThread tasks are doing.
        - bool have_lock = PeriodicTask_lock->try_lock();
        -
           _should_terminate = true;
        - OrderAccess::fence(); // ensure WatcherThread sees update in main loop
        -
        - if (have_lock) {
        + {
        + // Let safepoints happen to avoid the deadlock:
        + // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint.
        + // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock.
        + // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock,
        + // and holds up the VM thread by not reaching a safepoint.
        + MutexLocker mu(PeriodicTask_lock);
        +
             WatcherThread* watcher = watcher_thread();
             if (watcher != NULL) {
        - // If we managed to get the lock, then we should unpark the
        - // WatcherThread so that it can see we want it to stop.
        + // Unpark the WatcherThread so that it can see that it should terminate.
               watcher->unpark();
             }
        -
        - PeriodicTask_lock->unlock();
           }
         
           // it is ok to take late safepoints here, if needed

              dcubed Daniel Daugherty
              dcubed Daniel Daugherty
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: