-
Bug
-
Resolution: Fixed
-
P3
-
6u26
-
b16
-
x86
-
linux_redhat_5.0
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2221504 | 8 | Coleen Phillimore | P3 | Closed | Fixed | b27 |
JDK-2221545 | 7u4 | Coleen Phillimore | P3 | Closed | Fixed | b13 |
JDK-2224347 | 6u34 | Kevin Walls | P3 | Closed | Fixed | b01 |
JDK-2224414 | 6u33 | Kevin Walls | P2 | Closed | Fixed | b31 |
JDK-2221086 | 6u32 | Kevin Walls | P2 | Closed | Fixed | b32 |
JDK-2224348 | hs20.9 | Kevin Walls | P4 | Closed | Fixed | b01 |
JDK-2224415 | hs20.8 | Kevin Walls | P3 | Closed | Fixed | team |
JDK-2224162 | hs20.7 | Kevin Walls | P3 | Closed | Fixed | b03 |
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
FULL OS VERSION :
Linux 2.6.18-194.17.4.el5 #1 SMP Wed Oct 20 13:03:08 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
Java arguments/parameters: -d64 -server -Xms1024m -Xmx1024m -Xss320k -XX:MaxPermSize=384m
A DESCRIPTION OF THE PROBLEM :
As our clients have migrated to 64-bit JVMs we've seen a significant increase of JVM crashes due to SIGSEGV on Unix platforms. These are spontaneous and do not trigger a HotSpot crash report. Each crash involved an application bug that caused deep recursion that should have resulted in a java.lang.StackOverflowError, for instance an infinite struts forward or a search continuance/referral loop during LDAP authentication. This appears to affect both Solaris and Linux platforms, however we've only investigated further on Linux, but in both cases a 64-bit JVM was the common factor.
We referred to the Java SE Troubleshooting Guide section '4.1.3 Crash due to Stack Overflow' and found that likely, the StackShadowPages value is too small for these platforms. The guide discusses custom JNI libraries, however we're seeing these crashes due to 'normal' native calls; usually socket operations, reads or writes. That lead us to investigate further, as this should not be the case. According to the OpenJDK source the default on x84 platforms is 3, and is doubled to 6 on AMD64. There is a x86 Solaris value, seemingly to accomodate C++ compiler bugs on that platform, however our experience has shown that perhaps this value (20) is more broadly applicable:
http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/9b013e207574/src/cpu/x86/vm/globals_x86.hpp
60 #ifdef AMD64
61 // Very large C++ stack frames using solaris-amd64 optimized builds
62 // due to lack of optimization caused by C++ compiler bugs
63 define_pd_global(intx, StackShadowPages, SOLARIS_ONLY(20) NOT_SOLARIS(6) DEBUG_ONLY(+2));
64 #else
65 define_pd_global(intx, StackShadowPages, 3 DEBUG_ONLY(+5));
66 #endif // AMD64
Lab testing indicates that 17 is the smallest StackShadowPages size that prevents the JVM from crashing with a segmentation fault. We have not confirmed the value on Solaris (UltraSPARC, we don't support our product on Solaris x86), however we have certainly seen these conditions affect both platforms. So we can only conclude that either 64-bit native stack frames on AMD64 are generally far larger than their 32-bit equivalents or there's a problem with the way the value is calculated (I believe it's OS pagesize * StackShadowPages), allowing previously benign stack overflows in Java code to crash the JVM.
Others have also encountered 64-bit specific SIGSEGVs, bug 6346701 seems to report exactly this kind of condition, however I could not see an outright discussion anywhere that indicated that there could be a problem with the default shipping value or calculation of the number of pages to look ahead before invoking native methods:
http://confluence.atlassian.com/display/GHKB/JIRA+with+GreenHopper+Crashes+Java+with+a+SIGSEGV+Fault+on+Linux+64bit+JVMs
http://fusesource.com/forums/thread.jspa?messageID=7830
We isolated the problem using core dumps and identified the offending threads w/ gdb and then used jstack to generate thread dumps:
-- Crash 1 - Two application methods that call each other recursively, executing database statements, causing overflow during Oracle thin driver socket read:
Thread 18073: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int) @bci=84, line=129 (Compiled frame)
- oracle.net.ns.Packet.receive() @bci=31, line=240 (Compiled frame)
- oracle.net.ns.DataPacket.receive() @bci=1, line=92 (Compiled frame)
- oracle.net.ns.NetInputStream.getNextPacket() @bci=48, line=172 (Compiled frame)
- oracle.net.ns.NetInputStream.read(byte[], int, int) @bci=33, line=117 (Compiled frame)
- oracle.net.ns.NetInputStream.read(byte[]) @bci=5, line=92 (Compiled frame)
- oracle.jdbc.driver.T4CMAREngine.buffer2Value(byte) @bci=325, line=2320 (Compiled frame)
- oracle.jdbc.driver.T4CMAREngine.unmarshalUB4() @bci=2, line=1200 (Compiled frame)
- oracle.jdbc.driver.T4CTTIoer.unmarshal() @bci=200, line=270 (Compiled frame)
- oracle.jdbc.driver.T4C8Oall.receive() @bci=1507, line=1015 (Compiled frame)
- oracle.jdbc.driver.T4CPreparedStatement.doOall8(boolean, boolean, boolean, boolean) @bci=655, line=194 (Compiled frame)
- oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe() @bci=39, line=791 (Compiled frame)
- oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe() @bci=104, line=866 (Compiled frame)
- oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout() @bci=139, line=1186 (Compiled frame)
- oracle.jdbc.driver.OraclePreparedStatement.executeInternal() @bci=98, line=3387 (Compiled frame)
- oracle.jdbc.driver.OraclePreparedStatement.executeQuery() @bci=13, line=3431 (Compiled frame)
- oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery() @bci=4, line=1491 (Compiled frame)
- org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery() @bci=9, line=93 (Compiled frame)
- org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery() @bci=9, line=93 (Compiled frame)
...
-- Crash 2 - Infinite LDAP search referral/continuance due to incorrectly configured Active Directory server, overflowing during socket write:
Thread 11962: (state = IN_NATIVE)
- java.net.SocketOutputStream.socketWrite0(java.io.FileDescriptor, byte[], int, int) @bci=0 (Interpreted frame)
- java.net.SocketOutputStream.socketWrite(byte[], int, int) @bci=44, line=92 (Interpreted frame)
- java.net.SocketOutputStream.write(byte[], int, int) @bci=4, line=136 (Interpreted frame)
- java.io.BufferedOutputStream.flushBuffer() @bci=20, line=65 (Interpreted frame)
- java.io.BufferedOutputStream.flush() @bci=1, line=123 (Interpreted frame)
- com.sun.jndi.ldap.Connection.writeRequest(com.sun.jndi.ldap.BerEncoder, int, boolean) @bci=73, line=396 (Interpreted frame)
- com.sun.jndi.ldap.LdapClient.ldapBind(java.lang.String, byte[], javax.naming.ldap.Control[], java.lang.String, boolean) @bci=196, line=334 (Interpreted frame)
- com.sun.jndi.ldap.LdapClient.authenticate(boolean, java.lang.String, java.lang.Object, int, java.lang.String, javax.naming.ldap.Control[], java.util.Hashtable) @bci=315, line=192 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.connect(boolean) @bci=316, line=2694 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.<init>(java.lang.String, java.lang.String, int, java.util.Hashtable, boolean) @bci=390, line=293 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(java.lang.String, java.util.Hashtable) @bci=227, line=175 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(java.lang.Object, java.util.Hashtable) @bci=12, line=134 (Interpreted frame)
- com.sun.jndi.url.ldap.ldapURLContextFactory.getObjectInstance(java.lang.Object, javax.naming.Name, javax.naming.Context, java.util.Hashtable) @bci=17, line=35 (Interpreted frame)
- javax.naming.spi.NamingManager.getURLObject(java.lang.String, java.lang.Object, javax.naming.Name, javax.naming.Context, java.util.Hashtable) @bci=62, line=584 (Interpreted frame)
- javax.naming.spi.NamingManager.processURL(java.lang.Object, javax.naming.Name, javax.naming.Context, java.util.Hashtable) @bci=31, line=364 (Interpreted frame)
- javax.naming.spi.NamingManager.processURLAddrs(javax.naming.Reference, javax.naming.Name, javax.naming.Context, java.util.Hashtable) @bci=56, line=344 (Interpreted frame)
- javax.naming.spi.NamingManager.getObjectInstance(java.lang.Object, javax.naming.Name, javax.naming.Context, java.util.Hashtable) @bci=124, line=316 (Interpreted frame)
- com.sun.jndi.ldap.LdapReferralContext.<init>(com.sun.jndi.ldap.LdapReferralException, java.util.Hashtable, javax.naming.ldap.Control[], javax.naming.ldap.Control[], java.lang.String, boolean, int) @bci=212, line=93(Interpreted frame)
- com.sun.jndi.ldap.LdapReferralException.getReferralContext(java.util.Hashtable, javax.naming.ldap.Control[]) @bci=38, line=132 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.searchAux(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, boolean, boolean, com.sun.jndi.toolkit.ctx.Continuation) @bci=269, line=1838 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.c_search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, com.sun.jndi.toolkit.ctx.Continuation) @bci=14, line=1749 (Interpreted frame)
- com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, com.sun.jndi.toolkit.ctx.Continuation) @bci=72, line=368 (Interpreted frame)
- com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls) @bci=32, line=338 (Interpreted frame)
- com.sun.jndi.ldap.LdapReferralContext.search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls) @bci=44, line=639 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.searchAux(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, boolean, boolean, com.sun.jndi.toolkit.ctx.Continuation) @bci=282, line=1844 (Interpreted frame)
- com.sun.jndi.ldap.LdapCtx.c_search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, com.sun.jndi.toolkit.ctx.Continuation) @bci=14, line=1749 (Interpreted frame)
- com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls, com.sun.jndi.toolkit.ctx.Continuation) @bci=72, line=368 (Interpreted frame)
- com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(javax.naming.Name, java.lang.String, javax.naming.directory.SearchControls) @bci=32, line
...
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
We do not have an easily reducible use case to reproduce this problem. All issues involve our entire application stack.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
None. JVM process exits due to signal 11 (SIGSEGV).
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
Set -XX:StackShadowPages=20
- backported by
-
JDK-2221086 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2224414 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2221504 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2221545 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2224162 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2224347 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2224415 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
-
JDK-2224348 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV
- Closed
- relates to
-
JDK-6807602 Increase MAX_BUFFER_LEN and MAX_HEAP_BUFFER_LEN on 64-bit Solaris and Linux
- Resolved
-
JDK-7079763 Shouldn't socketoutputstream_socketwrite throw a stackoverflowexception?
- Closed
-
JDK-8047216 Sudden increase of frame size
- Closed
-
JDK-7155779 Add regression test for 7145587
- Closed
-
JDK-2221582 Stack overflows in Java code cause 64-bit JVMs to exit due to SIGSEGV (solaris sparc)
- Closed