Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2176891 | 7 | Dave Dice | P3 | Closed | Fixed | b12 |
JDK-2171914 | 6u4 | Dave Dice | P3 | Closed | Fixed | b03 |
JDK-2201075 | 5.0u17 | Vikram Aroskar | P3 | Closed | Fixed | b04 |
JDK-2157918 | 5.0u15-rev | Vikram Aroskar | P3 | Closed | Fixed | b11 |
The non-realtime JVM depends on the host operating system scheduler to *eventually* grant cycles to all runnable LWPs, regardless of the assigned Java priority. (Refer to http://blogs.sun.com/dave/entry/java_thread_priorities_demystified to better understand how the JVM maps Java thread priorities to underlying LWP priorities). Unfortunately in some exotic and rarely seen circumstances we've recently discovered that ready threads at low(er) priority in the Solaris IA and TS scheduling can starve indefinitely when competing against higher priority threads that park and unpark frequently. Specifically, the anti-starvation boost -- which solaris applies to threads languishing on the ready list -- is insufficient to overcome differences in the computed effective priority of threads at varying assigned priorities. (Refer to the Solaris man pages for ts_dptbl or "Solaris Internals", 2E, page 206, or http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/ts.c).
The starvation effect is readily reproducible with a simple "C" test case as well as the simple Java test case attached to the bug report. That is, there's nothing Java-specific about the underlying problem. Ultimately, I'd like to see the issue addressed by Solaris in the kernel but in the interim I'll try to modify the JVM to reduce the odds of encountering the problem.
To help avoid this problem I've changed the default for UseThreadPriorities to FALSE in the 1.7 source tree for Solaris. This change -- which disables the mapping of Java-level thread priorities to Solaris thread priorities -- should probably be backported into the 6.x update stream. Users on earlier releases can use -XX:-UseThreadProrities to achieve the same effect. Starvation will not occur with the assigned priorities of all threads competing for CPU cycles are equal. Beware that you can still encounter the starvation problem if you make JNI calls to native code that changes LWP priorities, or if you assign a non-default priority to an LWP and then attach that thread to the JVM.
I believe that both Windows and Linux are immune to similar starvation pathologies. The windows scheduler seems to provides anti-starvation effective priority boosting with sufficient authority to overcome priorities assigned via the SetThreadPriority() API, and the design of the "new" linux O(1) scheduler renders it immune to indefinite starvation. That is, regardless of the assigned priority, threads will eventually be granted CPU cycles by those schedulers.
As an aside, when the problem manifests the process may not be responsive to CTRL-C and may not be pstack-able.
We should be extremely careful about attributing observed hangs to this bug. There are a number of pending hotspot issues that can manifest in a similar fashion, including 6519515 and 6546278. More broadly, any unbounded spinning in the JVM or Java application code could easily trigger the starvation condition.
I also believe that some earlier bugs such as 6463925 are really instances of this bug (6518490).
-Dave
- backported by
-
JDK-2157918 Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
- Closed
-
JDK-2171914 Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
- Closed
-
JDK-2176891 Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
- Closed
-
JDK-2201075 Solaris TS scheduling class anti-starvation facility does not completely avoid starvation
- Closed
- relates to
-
JDK-6519515 Loop-opts incorrectly removed a safepoint poll from a loop with an early exit
- Closed
-
JDK-6473338 ATG application occasionally hangs during safepointing (due to monitor inflation livelock?)
- Closed
-
JDK-6463925 Long delays caused by spinning code
- Closed
-
JDK-6546278 Synchronization problem in the pseudo memory barrier code
- Closed
-
JDK-8169031 [Solaris] JVM is blissfully unaware of the Fair Share Scheduler
- Closed
-
JDK-8157010 [Solaris] Clean out incorrect usage of library-level thread priority functions
- Closed