Details
Description
ISV has developed an XML server that consists of a Java 'router' which accepts
RMI connection from clients and then routes each call to another RMI call
to a back-end process.
With a large number of clients connected one of the ISV's customers reported
the following failure when using JRE 1.3.1_03:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start(Native Method)
at sun.rmi.transport.tcp.TCPChannel.free(TCPChannel.java:286)
at sun.rmi.server.UnicastRef.free(UnicastRef.java:429)
at sun.rmi.server.UnicastRef.done(UnicastRef.java:449)
at sun.rmi.transport.DGCImpl_Stub.dirty(Unknown Source)
at sun.rmi.transport.DGCClient$EndpointEntry.makeDirtyCall(DGCClient.jav
a:318)
at sun.rmi.transport.DGCClient$EndpointEntry.access$1500(DGCClient.java:
136)
at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCCli
ent.java:529)
at java.lang.Thread.run(Thread.java:479)
Running with -verbose:gc indicated no such memory shortage. The ISV has
attempted to reproduce the problem with JREs 1.3.1_02, 1.3.1_03 and 1.3.1_05
and in each case has encountered what looks like a SIGBUS in the RMI code
that handles the sockets.
The ISV's environment consists of an E4500 with 8 CPUs and 8GB RAM, running
Solaris 8 with Oct/02/02 recommended patch cluster for Solaris 8 and the
Sep/24/02 J2SE patch cluster for 1.3.1
They have increased the fd limit to 2048 (from 1024) but the failure still
occurs at the same time (after approximately three to four hours).
Other items of note are that they are running with -Xmx256m and the alternate
libthread.
They connect 400 clients to their router, and connect to a corresponding
number of back-end processes. Each client connects through the same 'home'
object in the router and that 'home' object starts a back-end process to
handle the required computation. The back-end process is a C binary that does
the computation and also instantiates a VM, so with 400 clients you are
looking at 400 VMs. Once all the connections have been established there are
no more connects/disconnects so there is a "ramp up" phase and then a
prolonged period of RMI calls before the failure occurs.
The 'router', which forwards calls to the back-ends using RMI, is the component
that is failing.
Here is the relevant section of the log (running with 1.3.1_03 and -verbose:gc)
[GC 14299K->7678K(21528K), 0.0650366 secs]
[GC 14462K->7812K(21528K), 0.0627776 secs]
[GC 14596K->7968K(21528K), 0.0622701 secs]
[GC 14752K->8122K(21528K), 0.0684470 secs]
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 10 occurred at PC=0xff2718e4
Function name=memset
Library=/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1
Current Java thread:
at java.lang.Thread.start(Native Method)
at sun.rmi.transport.tcp.TCPTransport.run(TCPTransport.java:346)
at java.lang.Thread.run(Thread.java:479)
Attachments:
fox.xmlserver.15651.log - verbose:gc log running with 1.3.1_03
pstack.core.log - pstack output with 1.3.1_03 (SIGBUS)
core1725.txt - pstack output with 1.3.1_02 (SIGBUS)
jvm_coredump.txt - pstack output with 1.3.1_05 (SIG unknown)
RMI connection from clients and then routes each call to another RMI call
to a back-end process.
With a large number of clients connected one of the ISV's customers reported
the following failure when using JRE 1.3.1_03:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start(Native Method)
at sun.rmi.transport.tcp.TCPChannel.free(TCPChannel.java:286)
at sun.rmi.server.UnicastRef.free(UnicastRef.java:429)
at sun.rmi.server.UnicastRef.done(UnicastRef.java:449)
at sun.rmi.transport.DGCImpl_Stub.dirty(Unknown Source)
at sun.rmi.transport.DGCClient$EndpointEntry.makeDirtyCall(DGCClient.jav
a:318)
at sun.rmi.transport.DGCClient$EndpointEntry.access$1500(DGCClient.java:
136)
at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCCli
ent.java:529)
at java.lang.Thread.run(Thread.java:479)
Running with -verbose:gc indicated no such memory shortage. The ISV has
attempted to reproduce the problem with JREs 1.3.1_02, 1.3.1_03 and 1.3.1_05
and in each case has encountered what looks like a SIGBUS in the RMI code
that handles the sockets.
The ISV's environment consists of an E4500 with 8 CPUs and 8GB RAM, running
Solaris 8 with Oct/02/02 recommended patch cluster for Solaris 8 and the
Sep/24/02 J2SE patch cluster for 1.3.1
They have increased the fd limit to 2048 (from 1024) but the failure still
occurs at the same time (after approximately three to four hours).
Other items of note are that they are running with -Xmx256m and the alternate
libthread.
They connect 400 clients to their router, and connect to a corresponding
number of back-end processes. Each client connects through the same 'home'
object in the router and that 'home' object starts a back-end process to
handle the required computation. The back-end process is a C binary that does
the computation and also instantiates a VM, so with 400 clients you are
looking at 400 VMs. Once all the connections have been established there are
no more connects/disconnects so there is a "ramp up" phase and then a
prolonged period of RMI calls before the failure occurs.
The 'router', which forwards calls to the back-ends using RMI, is the component
that is failing.
Here is the relevant section of the log (running with 1.3.1_03 and -verbose:gc)
[GC 14299K->7678K(21528K), 0.0650366 secs]
[GC 14462K->7812K(21528K), 0.0627776 secs]
[GC 14596K->7968K(21528K), 0.0622701 secs]
[GC 14752K->8122K(21528K), 0.0684470 secs]
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 10 occurred at PC=0xff2718e4
Function name=memset
Library=/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1
Current Java thread:
at java.lang.Thread.start(Native Method)
at sun.rmi.transport.tcp.TCPTransport.run(TCPTransport.java:346)
at java.lang.Thread.run(Thread.java:479)
Attachments:
fox.xmlserver.15651.log - verbose:gc log running with 1.3.1_03
pstack.core.log - pstack output with 1.3.1_03 (SIGBUS)
core1725.txt - pstack output with 1.3.1_02 (SIGBUS)
jvm_coredump.txt - pstack output with 1.3.1_05 (SIG unknown)