Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: P2
Fix Version/s: 1.4.2_13
Affects Version/s: solaris_8
Component/s: hotspot
Labels:
None

Subcomponent:
runtime
CPU:

sparc
OS:

solaris_8

Operating System: Sparc Solaris
OS version: 5.8
Product Name: Java
Product version: 1.4.2_09
Hardware platform: Both Sun V240 and 480R
Severity: 1

Short description: Periodic unaccountable JVM pauses

Full problem description:
We have been experiencing periodic JVM pauses lasting for a few
seconds after a few hours of load testing in a crucial customer
application upgrade in their lab environment.
We are currently running on Java 1.4.2_09 and can not make the move
to Java 1.5 due to other Java CORBA related issues, so upgrading
the JVM into the 1.5 stream is not an option.

The application is running with the following JVM options:
-server
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-Xms256M
-Xmx256M
-Dsun.rmi.dgc.server.gcInterval=86400000
-Dsun.rmi.dgc.client.gcInterval=86400000
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc://opt/redknee/product/s2100/current/log/gcstats.log

We have analyzed the GC logs and all the logs reported sub-second
collection times.

We have periodically captured pstack/gcore analysis's and thread
dumps of the process in an attempt to deduce what the JVM is doing
during the paused periods.

These dumps were taken by an external application monitoring our
application event record logs looking for pauses in access of
three seconds which under our load testing should not occur.

We suspect the file reader's reading from an NFS mount might be
at fault. If an NFS mount becomes temporarily inaccessable we
speculate this might lock up the entire JVM and prevent the
application threads from running.

The application is running on a lightly loaded system in which
the JVM is the major process running around 20-30% of the CPU
for the node. We have tested on two separate lab hardware nodes
and experienced issues on both.

We would like Sun to analyze the current system patches and
validate there are no known issues with our software configuration
and if there are patches available for issues which pause the
JVM unexpectently.
The same issue was being observed on both the 1.4.2_08 and 1.4.2_09 nodes.

The two sets of logs and core details are captured and are on titan.sfbay.sun.com at :
   /tmp/Redknee/100705_214128 and /tmp/Redknee/100805_130529
   System Information is at : /tmp/Redknee/sysinfo.txt

Here some more details collected :
1) A system is monitoring the application record logs. So Is that monitoring system is also a java based application ?
* Yes this monitoring tool that we wrote is written in Java.
  It simply polled the Event Record files and read each line as they are written (on a fixed polling interval).
  When a extended period of time is passed where no ERs are recorded we would run a UNIX script using the Runtime.getRuntime().exec("scriptfilename") call.

2) When we say its monitoring, we need to know what is the activity involved with monitoring system ?
      Is it a Read-only process or Read-Write process to the application system.
      Let us know how the communication is happening, means, does it involve sockets/ftp process to access the application system.
* The monitoring tool simply reads the ER files. It does not directly communicate with the application at all.
  It uses read only permissions as we do not write to the ER files.

3) We understand that the corefile details and logs sent to us, were collected at application system. Correct us if not.
* Yes the Core files and logs were collected for the actual application system.
So the monitoring system does use java.io and java.nio.channels.FileChannel from the FileInputStream to read the ER files. The monitoring systempolls the ERs generated by the application server which resides on the local filesystem.

4) Detailed Information :
The application system polls ERs of a remote system. The ER directory of the remote system is mounted using a NFS mount on the local machine andwe poll the ERs from the NFS mount. The polling library that the application system is using to poll these remote ERs is identical to that used in our monitoring system. It uses java.io.FileInputStream to get the java.nio.channels.FileChannel and we use the FileChannel to read the ERs outof the file. The issue we are experiencing is that when we simulate a NFS timeout by routing the NFS mounts IP to a non-existent IP we are seeing that the entire JVM is being paused rather than just the threads that are accessing the NFS mount. This JVM wide pause is causing significant issues with our socket connections to external applications and thus needsfurther investigation. We need to ensure that this NFS mount timeout issue is the root cause of our JVM pauses as well as a understanding of whythe entire JVM is pausing under this scenario, which we hope to gather from your analysis.

Assignee:: Robert Mckenna

Reporter:: Sreenatha Dattatri (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2005-10-19 13:55

Updated:: 2010-08-18 10:18

Resolved:: 2006-05-09 05:19

Imported:: 16/Sep/12 6:35 PM

Indexed:: 18/Jul/12 1:15 PM

Details

Description

Attachments

Activity

People

Dates