At our ISV, EDS, there is a serious bug in their java application that causes the application to hang indefinitely. Their bug can be easily reproduced in their application but a test case cannot be easily made due to the many distributed components of the application. Since we cannot put together a test case I below I have created a list of data points to help us with our diagnosis.
- the bug is only seen on Solaris (runs well on Windows and HP-UX).
- customer is running Java 1.3.1_01
- the application is multithreaded and makes calls to OpenGL libraries and XServer via JNI.
- through the diagnosis of pstack traces, we have found that the hang seems to be occurring in the XServer code. We have verified that the AWT-Motif thread is indeed in a busy loop _XFlushInt.
- The application makes calls to the XServer in a refresh loop that they have running in the background that keeps the application view to be constantly updated. The application may open and close OpenGL windows therefore they need to run a refresh loop in the background. The way that they are accessing the XServer code may be causing XFlushInt to hang indefinitely.
- The hang occurs when a java.awt.Component setVisible() method is called. When the application calls setVisible(), it is also trying to load an OpenGL window in the component. Sometimes it is when the application is setVisible(true) sometimes setVisible(false). We have been able to dump a stack trace that shows us that sometimes it is stuck in the pShow method and sometimes in the pHide method. This leads us to believe that the XServer may have been in an unstable state or in a wait routine right before the setVisible method was called. The last call that the customer makes from their application is
XPutImage() via JNI. When the app hangs, we do a kill -3 and get the following output to see where the hang is. In this case it happened in a setVisible(true).
"AWT-Motif" prio=6 tid=0x28a590 nid=0xe runnable
at sun.awt.motif.MToolkit.run(Native Method)
"AWT-EventQueue-0" prio=6 tid=0x27f750 nid=0xc waiting for monitor entry
at sun.awt.motif.MComponentPeer.pShow(Native Method)
- The customer has tried to eliminate the code that communicates with the XServer, but this still causes the application to hang. It does not seeem to be a viable solution anyway.