-
Enhancement
-
Resolution: Not an Issue
-
P3
-
7u40, 8
-
solaris_11
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8023996 | 8 | Devika Gollapudi | P3 | Closed | Not an Issue |
The VM hangs on Solaris/OVM 2.2.x. It caused by underlying OVM issue and could not be fixed in VM.
This issues should be documented.
I've observed a hang in HotSpot when running bigapps/Kitchensink. The hangs happen from different paths that calls os_sleep in os_solaris.cpp, such as: safepointing, GC, java.lang.Thread.sleep.
I've narrowed this down to a call to gethrtime, in getTimeNanos, that suddenly returns a value that is twice as large as previous values. When this happens all subsequent calls to getTimeNanos will return the same value, causing the JVM to spin in os_sleep.
See:
inline hrtime_t getTimeNanos() {
if (VM_Version::supports_cx8()) {
const hrtime_t now = gethrtime();
// Use atomic long load since 32-bit x86 uses 2 registers to keep long.
const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
if (now <= prev) return prev; // same or retrograde time;
const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
assert(obsv >= prev, "invariant"); // Monotonicity
// If the CAS succeeded then we're done and return "now".
// If the CAS failed and the observed value "obs" is >= now then
// we should return "obs". If the CAS failed and now > obs > prv then
// some other thread raced this thread and installed a new value, in which case
// we could either (a) retry the entire operation, (b) retry trying to install now
// or (c) just return obs. We use (c). No loop is required although in some cases
// we might discard a higher "now" value in deference to a slightly lower but freshly
// installed obs value. That's entirely benign -- it admits no new orderings compared
// to (a) or (b) -- and greatly reduces coherence traffic.
// We might also condition (c) on the magnitude of the delta between obs and now.
// Avoiding excessive CAS operations to hot RW locations is critical.
// See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
return (prev == obsv) ? now : obsv ;
}
I've added this guarantee after the cmpxchg:
guarantee(now - prev < 100000000000L, err_msg("getTimeNanos to big delta prev: " JLONG_FORMAT " obsv: " JLONG_FORMAT " now: " JLONG_FORMAT " thread: " PTR_FORMAT, prev, obsv, now, Thread::current()));
And I hit the following assert:
guarantee(now - prev < 100000000000L) failed: getTimeNanos to big delta prev: 1312688670002335 obsv: 1312688670002335 now: 2625377340677208 thread: 0x0000000001a1d000
This is probably an OS/Virtual Machine bug, but it affects the JVM.
$ uname -a
SunOS slc05erh 5.11 11.1 i86pc i386 i86pc
This issues should be documented.
I've observed a hang in HotSpot when running bigapps/Kitchensink. The hangs happen from different paths that calls os_sleep in os_solaris.cpp, such as: safepointing, GC, java.lang.Thread.sleep.
I've narrowed this down to a call to gethrtime, in getTimeNanos, that suddenly returns a value that is twice as large as previous values. When this happens all subsequent calls to getTimeNanos will return the same value, causing the JVM to spin in os_sleep.
See:
inline hrtime_t getTimeNanos() {
if (VM_Version::supports_cx8()) {
const hrtime_t now = gethrtime();
// Use atomic long load since 32-bit x86 uses 2 registers to keep long.
const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
if (now <= prev) return prev; // same or retrograde time;
const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
assert(obsv >= prev, "invariant"); // Monotonicity
// If the CAS succeeded then we're done and return "now".
// If the CAS failed and the observed value "obs" is >= now then
// we should return "obs". If the CAS failed and now > obs > prv then
// some other thread raced this thread and installed a new value, in which case
// we could either (a) retry the entire operation, (b) retry trying to install now
// or (c) just return obs. We use (c). No loop is required although in some cases
// we might discard a higher "now" value in deference to a slightly lower but freshly
// installed obs value. That's entirely benign -- it admits no new orderings compared
// to (a) or (b) -- and greatly reduces coherence traffic.
// We might also condition (c) on the magnitude of the delta between obs and now.
// Avoiding excessive CAS operations to hot RW locations is critical.
// See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
return (prev == obsv) ? now : obsv ;
}
I've added this guarantee after the cmpxchg:
guarantee(now - prev < 100000000000L, err_msg("getTimeNanos to big delta prev: " JLONG_FORMAT " obsv: " JLONG_FORMAT " now: " JLONG_FORMAT " thread: " PTR_FORMAT, prev, obsv, now, Thread::current()));
And I hit the following assert:
guarantee(now - prev < 100000000000L) failed: getTimeNanos to big delta prev: 1312688670002335 obsv: 1312688670002335 now: 2625377340677208 thread: 0x0000000001a1d000
This is probably an OS/Virtual Machine bug, but it affects the JVM.
$ uname -a
SunOS slc05erh 5.11 11.1 i86pc i386 i86pc
- backported by
-
JDK-8023996 On OVM/Solaris 2.2.x gethrtime returns a too large value - causing the JVM to hang
- Closed
- duplicates
-
JDK-8013788 Java process hung in SafePointing
- Closed
- relates to
-
JDK-8013788 Java process hung in SafePointing
- Closed