-
Bug
-
Resolution: Fixed
-
P4
-
None
-
b55
-
generic
-
generic
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8082978 | emb-9 | Daniel Daugherty | P4 | Resolved | Fixed | team |
JDK-8285659 | 8u351 | Dukebot | P4 | Resolved | Fixed | b01 |
JDK-8289855 | 8u341 | Dukebot | P4 | Closed | Fixed | b32 |
The fix for the following bug may need more work:
JDK-8047720 Xprof hangs on Solaris
I've seen two separate e-mails about the fix for 8047720 in less than a week.
First was this e-mail sent directly to me:
On 1/28/15 12:28 PM, Alexander Garthwaite wrote:
> Subject: question about fix for PeriodicTask_lock/WatcherThread deadlock
> Dan,
> Apologies for writing directly. I am backporting your fix to a version
> 7 copy of the OpenJDK used at Twitter. The fix (which looks good overall)
> is to try the lock and only wake the WatcherThread if we get it. It appears
> there is still a small race (which may not matter) if we have the following
> sequence:
> * the WatcherThread has the lock
> * some other thread, T, calls stop() and fails to get the lock
> * the WatcherThread unlocks the PeriodicTask_lock, still sees
> _should_stop as false and parks
> * T doesn't have the lock and so exits stop without doing an unpack
> This would allow the WatcherThread to park without seeing that it should stop.
> I was wondering if the following small change would fix that:
> void WatcherThread::stop() {
> + // Get the PeriodicTask_lock if we can. If we cannot, then the
> + // WatcherThread is using it and we don't want to block on that lock
> + // here because that might cause a safepoint deadlock depending on
> + // what the current WatcherThread tasks are doing.
> + bool have_lock = PeriodicTask_lock->try_lock();
> +
> + _should_terminate = true;
> + OrderAccess::fence(); // ensure WatcherThread sees update in main loop
> // Retry the lock if we don't have it. This handles the case where
> // the WatcherThread stopped using and parked without seeing
> // _should_terminate set to true.
> if (!have_lock) {
> have_lock = PeriodicTask_lock->try_lock();
> }
> + if (have_lock) {
> WatcherThread* watcher = watcher_thread();
> + if (watcher != NULL) {
> + // If we managed to get the lock, then we should unpark the
> + // WatcherThread so that it can see we want it to stop.
> watcher->unpark();
> + }
> +
> + PeriodicTask_lock->unlock();
> }
> (Ignore the leading '+'s. They're marking your patch.) Thanks!
> -- Alex
Second was this e-mail sent on 2015.02.03 to serviceability-dev@openjdk.java.net:
On 2/3/15 12:00 PM, Carsten Varming wrote:
> Subject: Patch: Clean up fix forJDK-8047720
> Greetings all,
> I was recently introduced to the deadlock described inJDK-8047720 and fixed in JDK9. The fix seems a little messy to me, and it looks like it left some very short races in the code. So I decided to clean up the code. See attached diff.
> The change takes a step back and acquires PeriodicTask_lock in WatcherThread::stop (like before the fix in JDK9), but this time safepoints are allowed to proceed when acquiring PeriodicTask_lock, preventing the deadlock.
> Let me know what you think,
> Carsten
Carsten's patch attachment:
diff -r 8ff882030755 src/share/vm/runtime/thread.cpp
--- a/src/share/vm/runtime/thread.cpp Wed Jan 28 16:45:35 2015 -0800
+++ b/src/share/vm/runtime/thread.cpp Tue Feb 03 13:52:32 2015 -0500
@@ -1318,24 +1318,20 @@
}
void WatcherThread::stop() {
- // Get the PeriodicTask_lock if we can. If we cannot, then the
- // WatcherThread is using it and we don't want to block on that lock
- // here because that might cause a safepoint deadlock depending on
- // what the current WatcherThread tasks are doing.
- bool have_lock = PeriodicTask_lock->try_lock();
-
_should_terminate = true;
- OrderAccess::fence(); // ensure WatcherThread sees update in main loop
-
- if (have_lock) {
+ {
+ // Let safepoints happen to avoid the deadlock:
+ // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint.
+ // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock.
+ // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock,
+ // and holds up the VM thread by not reaching a safepoint.
+ MutexLocker mu(PeriodicTask_lock);
+
WatcherThread* watcher = watcher_thread();
if (watcher != NULL) {
- // If we managed to get the lock, then we should unpark the
- // WatcherThread so that it can see we want it to stop.
+ // Unpark the WatcherThread so that it can see that it should terminate.
watcher->unpark();
}
-
- PeriodicTask_lock->unlock();
}
// it is ok to take late safepoints here, if needed
I've seen two separate e-mails about the fix for 8047720 in less than a week.
First was this e-mail sent directly to me:
On 1/28/15 12:28 PM, Alexander Garthwaite wrote:
> Subject: question about fix for PeriodicTask_lock/WatcherThread deadlock
> Dan,
> Apologies for writing directly. I am backporting your fix to a version
> 7 copy of the OpenJDK used at Twitter. The fix (which looks good overall)
> is to try the lock and only wake the WatcherThread if we get it. It appears
> there is still a small race (which may not matter) if we have the following
> sequence:
> * the WatcherThread has the lock
> * some other thread, T, calls stop() and fails to get the lock
> * the WatcherThread unlocks the PeriodicTask_lock, still sees
> _should_stop as false and parks
> * T doesn't have the lock and so exits stop without doing an unpack
> This would allow the WatcherThread to park without seeing that it should stop.
> I was wondering if the following small change would fix that:
> void WatcherThread::stop() {
> + // Get the PeriodicTask_lock if we can. If we cannot, then the
> + // WatcherThread is using it and we don't want to block on that lock
> + // here because that might cause a safepoint deadlock depending on
> + // what the current WatcherThread tasks are doing.
> + bool have_lock = PeriodicTask_lock->try_lock();
> +
> + _should_terminate = true;
> + OrderAccess::fence(); // ensure WatcherThread sees update in main loop
> // Retry the lock if we don't have it. This handles the case where
> // the WatcherThread stopped using and parked without seeing
> // _should_terminate set to true.
> if (!have_lock) {
> have_lock = PeriodicTask_lock->try_lock();
> }
> + if (have_lock) {
> WatcherThread* watcher = watcher_thread();
> + if (watcher != NULL) {
> + // If we managed to get the lock, then we should unpark the
> + // WatcherThread so that it can see we want it to stop.
> watcher->unpark();
> + }
> +
> + PeriodicTask_lock->unlock();
> }
> (Ignore the leading '+'s. They're marking your patch.) Thanks!
> -- Alex
Second was this e-mail sent on 2015.02.03 to serviceability-dev@openjdk.java.net:
On 2/3/15 12:00 PM, Carsten Varming wrote:
> Subject: Patch: Clean up fix for
> Greetings all,
> I was recently introduced to the deadlock described in
> The change takes a step back and acquires PeriodicTask_lock in WatcherThread::stop (like before the fix in JDK9), but this time safepoints are allowed to proceed when acquiring PeriodicTask_lock, preventing the deadlock.
> Let me know what you think,
> Carsten
Carsten's patch attachment:
diff -r 8ff882030755 src/share/vm/runtime/thread.cpp
--- a/src/share/vm/runtime/thread.cpp Wed Jan 28 16:45:35 2015 -0800
+++ b/src/share/vm/runtime/thread.cpp Tue Feb 03 13:52:32 2015 -0500
@@ -1318,24 +1318,20 @@
}
void WatcherThread::stop() {
- // Get the PeriodicTask_lock if we can. If we cannot, then the
- // WatcherThread is using it and we don't want to block on that lock
- // here because that might cause a safepoint deadlock depending on
- // what the current WatcherThread tasks are doing.
- bool have_lock = PeriodicTask_lock->try_lock();
-
_should_terminate = true;
- OrderAccess::fence(); // ensure WatcherThread sees update in main loop
-
- if (have_lock) {
+ {
+ // Let safepoints happen to avoid the deadlock:
+ // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint.
+ // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock.
+ // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock,
+ // and holds up the VM thread by not reaching a safepoint.
+ MutexLocker mu(PeriodicTask_lock);
+
WatcherThread* watcher = watcher_thread();
if (watcher != NULL) {
- // If we managed to get the lock, then we should unpark the
- // WatcherThread so that it can see we want it to stop.
+ // Unpark the WatcherThread so that it can see that it should terminate.
watcher->unpark();
}
-
- PeriodicTask_lock->unlock();
}
// it is ok to take late safepoints here, if needed
- backported by
-
JDK-8082978 Further refinement of the fix JDK-8047720 - Xprof hangs on Solaris
- Resolved
-
JDK-8285659 Further refinement of the fix JDK-8047720 - Xprof hangs on Solaris
- Resolved
-
JDK-8289855 Further refinement of the fix JDK-8047720 - Xprof hangs on Solaris
- Closed
- relates to
-
JDK-8074314 PeriodicTask and WatcherThread APIs need to be cleaned up
- Closed
-
JDK-8047720 Xprof hangs on Solaris
- Closed