-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
7u45
-
x86_64
-
linux
FULL PRODUCT VERSION :
1.7.0 build 51
ADDITIONAL OS VERSION INFORMATION :
Linux cmslivmyu03 3.0.34-0.7-default #1 SMP Tue Jun 19 09:56:30 UTC 2012 (fbfc70c) x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
32 cores, 256 GByte main memory
A DESCRIPTION OF THE PROBLEM :
Our customer observed http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8013366 in the course of a critical setup procedure under high load. The following exception was observed:
Caused by: org.omg.CORBA.COMM_FAILURE:
at com.sun.corba.se.impl.logging.ORBUtilSystemException.bufferReadManagerTimeout(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.logging.ORBUtilSystemException.bufferReadManagerTimeout(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.BufferManagerReadStream.underflow(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_1.grow(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_2.alignAndCheck(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_0.read_long(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream.read_long(Unknown Source) ~[na:1.7.0_51]
at ...
It looks like BufferManagerReadStream#underflow still contains the same bug that was explained in the linked bug report. That bug report was closed as "not an issue".
boolean interrupted = false;
try {
fragmentQueue.wait(FRAGMENT_TIMEOUT);
} catch (InterruptedException e) {
interrupted = true;
}
if (!interrupted && fragmentQueue.size() == 0) {
throw wrapper.bufferReadManagerTimeout();
}
In normal operation, when the waiting thread is not interrupted, the call to wait() may still return before the specified timeout is over. Quoting from section 17.2.1 of the Java Language Specification: 'An internal action by the implementation. Implementations are permitted, although not encouraged, to perform "spurious wake-ups", that is, to remove threads from wait sets and thus enable resumption without explicit instructions to do so.' In my experience, JVMs with many core are particularly likely to exhibit this behavior.
I recommend http://stackoverflow.com/questions/1038007/why-should-wait-always-be-called-inside-a-loop for a start. The correct source code looks like this:
long deadline = System.currentTimeMillis() + THE_MAXIMUM_WAIT_PERIOD;
synchronized (THE_LOCK) {
long toWait;
while (!THE_STOP_CONDITION && (toWait = deadline - System.currentTimeMillis()) > 0) {
try {
THE_LOCK.wait(toWait);
} catch (InterruptedException ie) {
HANDLE_THE_INTERRUPTION_MAYBE_BREAK;
}
}
}
REPRODUCIBILITY :
This bug can be reproduced rarely.
1.7.0 build 51
ADDITIONAL OS VERSION INFORMATION :
Linux cmslivmyu03 3.0.34-0.7-default #1 SMP Tue Jun 19 09:56:30 UTC 2012 (fbfc70c) x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
32 cores, 256 GByte main memory
A DESCRIPTION OF THE PROBLEM :
Our customer observed http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8013366 in the course of a critical setup procedure under high load. The following exception was observed:
Caused by: org.omg.CORBA.COMM_FAILURE:
at com.sun.corba.se.impl.logging.ORBUtilSystemException.bufferReadManagerTimeout(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.logging.ORBUtilSystemException.bufferReadManagerTimeout(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.BufferManagerReadStream.underflow(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_1.grow(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_2.alignAndCheck(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream_1_0.read_long(Unknown Source) ~[na:1.7.0_51]
at com.sun.corba.se.impl.encoding.CDRInputStream.read_long(Unknown Source) ~[na:1.7.0_51]
at ...
It looks like BufferManagerReadStream#underflow still contains the same bug that was explained in the linked bug report. That bug report was closed as "not an issue".
boolean interrupted = false;
try {
fragmentQueue.wait(FRAGMENT_TIMEOUT);
} catch (InterruptedException e) {
interrupted = true;
}
if (!interrupted && fragmentQueue.size() == 0) {
throw wrapper.bufferReadManagerTimeout();
}
In normal operation, when the waiting thread is not interrupted, the call to wait() may still return before the specified timeout is over. Quoting from section 17.2.1 of the Java Language Specification: 'An internal action by the implementation. Implementations are permitted, although not encouraged, to perform "spurious wake-ups", that is, to remove threads from wait sets and thus enable resumption without explicit instructions to do so.' In my experience, JVMs with many core are particularly likely to exhibit this behavior.
I recommend http://stackoverflow.com/questions/1038007/why-should-wait-always-be-called-inside-a-loop for a start. The correct source code looks like this:
long deadline = System.currentTimeMillis() + THE_MAXIMUM_WAIT_PERIOD;
synchronized (THE_LOCK) {
long toWait;
while (!THE_STOP_CONDITION && (toWait = deadline - System.currentTimeMillis()) > 0) {
try {
THE_LOCK.wait(toWait);
} catch (InterruptedException ie) {
HANDLE_THE_INTERRUPTION_MAYBE_BREAK;
}
}
}
REPRODUCIBILITY :
This bug can be reproduced rarely.
- duplicates
-
JDK-8319727 Harden BufferManagerReadStream underflow logic
- Resolved
- relates to
-
JDK-8013366 corba comm_failure timeout due to spurious wakeups
- Closed