Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: P3
Fix Version/s: None
Affects Version/s: 6u29
Component/s: core-libs
Labels:
- CAP

Subcomponent:
java.rmi
CPU:

generic
OS:

generic

J2SE Version (please include all output from java -version flag):

6u29

Does this problem occur on J2SE 1.4.x, 1.5 or 6? Yes / No (pick one)
Yes

Bug Description:

A customer heap dump clearly shows that over 80% of the heap is being
consumed by a private static map within sun.rmi.transport.DGCAckHandler.
This map maps from a java.rmi.server.UID to a sun.rmi.transport.DGCAckHandler
instance. There are 11,300,771 entries in this map consuming 880,764,432
bytes of heap memory.

Currently they have specified -Dsun.rmi.dgc.client.gcInterval=3600000 and
-Dsun.rmi.dgc.server.gcInterval=3600000 to reduce DGC overhead.
And they remove these as a troubleshooting measure. That said,
don't see any direct link between these settings and the issue at hand
from reading JVM source code.Instead, it seems be in the bowels of
the JVM RMI code.

Reading the decompiled Java 6 Update 29 code, it appears that
DGCAckHandlers are created and inserted into the map by
sun.rmi.transport.ConnectionOutputStream.saveObject(),
which in turn appears to be called by sun.rmi.transport.LiveRef
whenever writing a live Remote object to an RMI response.

Once an entry is added to the map, it will only be removed by
DGCAckHandler.received(UID), which is called upon receipt of
a DGCAck from the client. A DGC ack timeout (5 minutes by default)
will null out the non-final fields of DGCAckHandler -- but won't
remove the entry from the list! So it would appear that if many
disparate live references are returned to clients which don't respond
with a DGC ack, that this map will grow without bounds.
the JVM should do better than that, though.

It appears that all the DGCAckHandler instances have a null objList,
which seems to imply that all of these have timed out!

Thought pass this along as thought this might ring some bells there.
the anonymous Runnable passed to schedule() in DGCAckHandler.startTimer()
should also call idTable.remove(). The code feels fragile otherwise,
More information from the submitter:

A customer is encountering enormous memory usage as per below within the bowels
of DGCAckHandler. The customer in question is using Java 6 Update 29 on Solaris
10 and seeing the issue when doing a stress test with OracleATS
(Advanced Test Suite).
More information from the submitter:

We have finally gotten enough information from the customer to reproduce the issue.

We have RMI server processes which when they discover they are overloaded do the following:
         if ( needToRedirect(...) )
         {
            MethodServer next_server = nextMethodServer(); // MethodServer is a java.rmi.Remote interface
            if (next_server != null)
            {
               ...
               throw new ServerLoadException(next_server); // TO_DO: ponder where/how/if we might decide that no one wants this request and return no-next server or some such...
            }
         }
where nextMethodServer internally does:
nextMethodServer = serverManager.getNextServer( thisMethodServerImpl );
plus some caching of this result with a time-to-live, where serverManager is a stub to another RMI server (kind of an overseer), nextMethodServer is a stub to another MethodServer and thisMethodServerImpl is the MethodServer implementation within this RMI server process.

It appears that either the generation of the ServerLoadException or invocation of getNextServer() is triggering DGCAckHandler instances to be created which will never be DGC ack'ed. Given the time-to-live caching, the ServerLoadException is the issue.

The number of leaked DGCAckHandlers appears to be exactly equal to the number of calls to ServerLoadException(), so I think that points quite convincingly to this being the cause of the issue.

Modifying ServerLoadException to hide the MethodServer stub in a byte[] (as per attached) rather than leaving it as a simple non-transient field seems to completely workaround the issue.

That raises the question as to what should be fixed in the RMI internals, as I don't see anything clearly wrong with the RMI usage pattern here that should trigger a DGCAckHandler leak.

duplicates

JDK-8046339 sun.rmi.transport.DGCAckHandler leaks memory

Closed

Assignee:: Darryl Mocek

Reporter:: Ting-Yun Ingrid Yao (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2011-11-28 11:30

Updated:: 2024-10-09 11:41

Resolved:: 2014-06-09 15:15

Imported:: 16/Sep/12 5:49 PM

Indexed:: 18/Jul/12 12:24 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates