-
Bug
-
Resolution: Cannot Reproduce
-
P4
-
None
-
1.2.0
-
x86
-
windows_98
Name: gsC80088 Date: 02/16/99
0. Problem description: Windows hangs (completely) as a result
of an AWT 1.2 deadlock. The problem has been reproduced on
several Windows 95 / Windows 95B OSR2 / Windows 98 machines,
with several graphics adapters, motherboards, system configurations.
Because the deadlocks occurs in either the paint() method
or the Windows event dispatch, Windows FREEZES totally
(only the mouse still moves).
Sometimes, it is possible to call the Windows task-manager,
but in most cases the system needs a REBOOT.
We could not reproduce the problem under Windows'NT, but we
cannot guarantee that the bug won't occur unter NT.
JavaFIG runs fine with JDK 1.1.5/6/7 on Windows 95/95B/98/NT,
Linux, Solaris, Microsoft VM 3.0/3.1 on Windows 95/95B/98/NT,
and JDK 1.2 on Solaris.
1. Reproduce the problem:
1.1) Download the current version of JavaFIG (1.33) from
http://tech-www.informatik.uni-hamburg.de/applets/javafig/
1.2) Install the JAR archive to your classpath, or unpack the
archive.
1.3) Run the JAR archive with JDK 1.2,
java -jar javafig-classes.zip
or start the editor or the viewer
java javafig.gui.Editor
java javafig.gui.Viewer
1.4) In the editor, load the demos (help->demos->demo xx),
or load some files. Depending on your system config and
speed, the JVM will deadlock during repaints to the status
message panel. Sometimes, this may take several tries,
but sometimes, even the first load may fail.
1.5) The system freezes. Running in the debugger doesn't
help, because the debugger console is dead too, after the
bug hits.
2) You can get the JavaFIG source code, if you really want it.
3) No error messages. The JVM simply deadlocks.
4) No trace info available.
If we disable the status messages (and send them to System.out
instead), everything works ok.
Therefore, the problem is either 1) concurrent repaints originating
from multiple threads, or 2) too frequent repaints
5) The problem occurs only with JDK 1.2 final. 1.2beta4 was
fine, as are 1.1.5/6/7.
There is probably no hardware dependency. The same systems
that instantly deadlock with JDK 1.2 run rock-solid with
JDK 1.1.7, Microsoft SDK 3.1, and all our other Windows
programs and networking.
(Review ID: 52574)
======================================================================
gstone@eng 1999-02-16
I asked the user for some code and more info and he's provided it. I've attached the files (in a winzip file), and here are his remarks:
thanks for your reply to my bug report.
Unfortunately, so far we cannot reproduce the bug in small standalone
apps or applets. Also, we have not found a workaround yet (except to
run JDK 1.1.7).
---
CONTENTS OF THIS MAIL:
As an attachment, I send you all the source files from JavaFIG that
I suspect. Please excuse that the code is not as beautiful as it should
be, but it contains a lot of debug/workaround code assembled during
testing with JDK 1.1.beta up to JDK 1.2 final...
Please tell us, if you should find some obvious bugs in those 'critical'
parts of our code. If not, I will send you the full source code on
request; however, as it consists of more than 100 classes with quite
complex interaction, I did not include all of it this time.
Also, I attach two screenshots (GIF images) that illustrate another
thread interaction bug that may well be related to the deadlocks we
experience.
---
ABOUT JAVAFIG REDRAWING:
In the following, I will try to explain the basic redrawing and thread
interaction (as I see it). If you find an obvious bug in our code,
please tell us quick.
JavaFIG is intended as a full replacement for the Unix/X11
graphics/diagram
editor FIG. Except for some limitations of the Java 1.1 AWT (like
rotated
texts of fill patterns), most of the FIG functions are there.
For historical reasons, some classes are called 'GE_xxx' for 'Graphics
Editor class xxx', and we don't (yet) use the standard Java naming
convention,
as de.uni-hamburg.informatik.javafig.xxx would be very long.
In order to realize functions like zooming, panning, rubberbanding, and
the magnetic grid, FIG is built from four Java packages:
1) graphical objects, like polylines, rectangles, text, splines, ...
2) a canvas to display the objects
3) an editor to manage objects, GUI and user interaction
4) command objects for one edit-scenario each
Perhaps the most critical (in terms of deadlock) part is the drawing
canvas, class javafig.canvas.GE_canvas.
To improve performance, GE_canvas manages an offscreen buffer image.
Some drawing operations are performed directly on the front buffer
Graphics
object, other on the offscreen buffer Graphics.
We have several types of redrawing, see method GE_canvas.handleRedraw().
Internal flags decide whether to perform a full redraw, to redraw the
grid and the objects, to redraw for rubberbanding, or simply to bitblit
the offscreen buffer to the screen.
Mode: Action:
NoChanges redraw sliders and cursor
MouseMotionRedraw redraw sliders and cursor
FullRedraw initialize (allocate) the offscreen buffer
clear the offscreen buffer
draw the rulers
draw the grid
draw all objects
blit the offscreen buffer to the screen
draw any tmp objects
draw sliders and cursor
TmpObjectRedraw blit offscreen buffer, draw tmp objects, draw slider
SyncRedraw (for animation)
blit offscreen buffer, except for rulers to prohibit
flickering,
draw tmp objects (the animated/active objects)
draw sliders and cursors
---
THE RACE CONDITION:
This redraw strategy has evolved during many versions of JDK 1.1.x
and seems to work most of the time. One of the most critical methods
in terms of performance is the drawGrid() method. Instead of drawing
thousands of one-pixel lines, we break the full drawing canvas into
smaller blocks, draw one block, and copy this block over the full area
using the Graphcis.copyArea() method.
However, there still seems to be an annyoing race condition inside the
AWT,
which sometimes results in a broken grid display. This bug is hard to
reproduce on Windows, but it occurs more frequently on Solaris and
Linux.
See the image 'jdkbug-grid-broken.gif' attached below: Instead of
copying
the correct (cleared) part of the offscreen image, the copyArea() method
has copied parts of the offscreen image with data in it...
The second image attached to this mail shows another screenshot with an
apparent AWT race condition with broken colors: for some reasons inside
the AWT, the central object is filled with cyan color instead of white.
---
THE STATUS CANVAS:
The second source file attached contains the implementation of the
JavaFIG status line. Because of resize-bugs in AWT 1.0 we could not use
class java.awt.Label, but had to provide our own class,
based on java.awt.Canvas.
The class also manages a 'history stack' for the messages, which is used
for pushing/popping 'bubble help' status messages when the user moves
over one of the command buttons.
---
THE DEADLOCK:
Usually, the status message is redrawn from time to time only, as a
reaction to user commands or mouse movements (like the bubble help
messages).
However, the status message canvas is also used to display status
messages from the parser, when reading in a FIG file - typically
one message per parsed object. This in turn results in a very high
number of repaint requests up to several hundred repaint()s per second
on fast computers. Note that the parser is running as a seperate thread,
in order to keep repaints and user interaction alive during parsing.
If we disable the parser status messages, Windows does not deadlock.
If we reduce the number of status messages (e.g., one per 100 parsed
objects), the deadlock probability is reduced, but still not zero.
This behaviour led to my original hypothesis, that AWT/Windows does
not 'like' hundreds of repaints per second. However, with a reduced
(but non-zero) frequency of repaints, the deadlock still occurs.
Another reason for the deadlock might be the synchronisation when
accessing the message stack (push() and pop() in class java.util.Stack
are synchronized) from both the parser thread and the redraw thread or
the AWT event queue thread. However, AFAIK, there are no status message
requests from the repaint thread, and we did not find any when trying
to protocol them (see the 'printThread() method in
GE_statusCanvas.java).
Debugging under Windows is quite difficult, because once the deadlock
occurs, the debugger is dead, too...
A second scenario which triggers the deadlock (on Solaris and Linux,
too)
is the 'BreakCompoundCommand' command. However, when running inside
the debugger, the deadlock is very hard to reproduce. Therefore, even
in this case we have no idea as to where the deadlock occurs.
---
SUMMARY:
- The deadlock occurs only in the full-featured application.
- We have two custom-GUI classes, both inheriting from java.awt.Canvas:
javafig.canvas.GE_canvas (and subclasses like EditorCanvas)
javafig.gui.GE_statusCanvas
- the deadlock occurs with repaints() pending to both the status canvas
and the main canvas
- the deadlock occurs very 'fast', when requesting frequent repaints()
from the status canvas, especially when calling
GE_statusMessage.repaint()
from a separate threaad (the parser)
- the deadlock may be due to synchronized accesses to the
java.util.Stack
used inside GE_statusCanvas. However, we cannot debug this on Windows
(system hangs completely), and there is no indication for this
scenario
on other platforms.
- the grid redraw algorithm used in JavaFIG works most of the time,
but sometimes the grid is not cleared correctly inside the AWt.
We suppose a 'race condition' inside the AWT when accessing and using
offscreen buffers (java.awt.Image) together with Graphics.copyArea()
---
Hope that helps! Please feel free to point out any bugs or abuse of
Java class libraries to us, and don't hesitate to request more info.
0. Problem description: Windows hangs (completely) as a result
of an AWT 1.2 deadlock. The problem has been reproduced on
several Windows 95 / Windows 95B OSR2 / Windows 98 machines,
with several graphics adapters, motherboards, system configurations.
Because the deadlocks occurs in either the paint() method
or the Windows event dispatch, Windows FREEZES totally
(only the mouse still moves).
Sometimes, it is possible to call the Windows task-manager,
but in most cases the system needs a REBOOT.
We could not reproduce the problem under Windows'NT, but we
cannot guarantee that the bug won't occur unter NT.
JavaFIG runs fine with JDK 1.1.5/6/7 on Windows 95/95B/98/NT,
Linux, Solaris, Microsoft VM 3.0/3.1 on Windows 95/95B/98/NT,
and JDK 1.2 on Solaris.
1. Reproduce the problem:
1.1) Download the current version of JavaFIG (1.33) from
http://tech-www.informatik.uni-hamburg.de/applets/javafig/
1.2) Install the JAR archive to your classpath, or unpack the
archive.
1.3) Run the JAR archive with JDK 1.2,
java -jar javafig-classes.zip
or start the editor or the viewer
java javafig.gui.Editor
java javafig.gui.Viewer
1.4) In the editor, load the demos (help->demos->demo xx),
or load some files. Depending on your system config and
speed, the JVM will deadlock during repaints to the status
message panel. Sometimes, this may take several tries,
but sometimes, even the first load may fail.
1.5) The system freezes. Running in the debugger doesn't
help, because the debugger console is dead too, after the
bug hits.
2) You can get the JavaFIG source code, if you really want it.
3) No error messages. The JVM simply deadlocks.
4) No trace info available.
If we disable the status messages (and send them to System.out
instead), everything works ok.
Therefore, the problem is either 1) concurrent repaints originating
from multiple threads, or 2) too frequent repaints
5) The problem occurs only with JDK 1.2 final. 1.2beta4 was
fine, as are 1.1.5/6/7.
There is probably no hardware dependency. The same systems
that instantly deadlock with JDK 1.2 run rock-solid with
JDK 1.1.7, Microsoft SDK 3.1, and all our other Windows
programs and networking.
(Review ID: 52574)
======================================================================
gstone@eng 1999-02-16
I asked the user for some code and more info and he's provided it. I've attached the files (in a winzip file), and here are his remarks:
thanks for your reply to my bug report.
Unfortunately, so far we cannot reproduce the bug in small standalone
apps or applets. Also, we have not found a workaround yet (except to
run JDK 1.1.7).
---
CONTENTS OF THIS MAIL:
As an attachment, I send you all the source files from JavaFIG that
I suspect. Please excuse that the code is not as beautiful as it should
be, but it contains a lot of debug/workaround code assembled during
testing with JDK 1.1.beta up to JDK 1.2 final...
Please tell us, if you should find some obvious bugs in those 'critical'
parts of our code. If not, I will send you the full source code on
request; however, as it consists of more than 100 classes with quite
complex interaction, I did not include all of it this time.
Also, I attach two screenshots (GIF images) that illustrate another
thread interaction bug that may well be related to the deadlocks we
experience.
---
ABOUT JAVAFIG REDRAWING:
In the following, I will try to explain the basic redrawing and thread
interaction (as I see it). If you find an obvious bug in our code,
please tell us quick.
JavaFIG is intended as a full replacement for the Unix/X11
graphics/diagram
editor FIG. Except for some limitations of the Java 1.1 AWT (like
rotated
texts of fill patterns), most of the FIG functions are there.
For historical reasons, some classes are called 'GE_xxx' for 'Graphics
Editor class xxx', and we don't (yet) use the standard Java naming
convention,
as de.uni-hamburg.informatik.javafig.xxx would be very long.
In order to realize functions like zooming, panning, rubberbanding, and
the magnetic grid, FIG is built from four Java packages:
1) graphical objects, like polylines, rectangles, text, splines, ...
2) a canvas to display the objects
3) an editor to manage objects, GUI and user interaction
4) command objects for one edit-scenario each
Perhaps the most critical (in terms of deadlock) part is the drawing
canvas, class javafig.canvas.GE_canvas.
To improve performance, GE_canvas manages an offscreen buffer image.
Some drawing operations are performed directly on the front buffer
Graphics
object, other on the offscreen buffer Graphics.
We have several types of redrawing, see method GE_canvas.handleRedraw().
Internal flags decide whether to perform a full redraw, to redraw the
grid and the objects, to redraw for rubberbanding, or simply to bitblit
the offscreen buffer to the screen.
Mode: Action:
NoChanges redraw sliders and cursor
MouseMotionRedraw redraw sliders and cursor
FullRedraw initialize (allocate) the offscreen buffer
clear the offscreen buffer
draw the rulers
draw the grid
draw all objects
blit the offscreen buffer to the screen
draw any tmp objects
draw sliders and cursor
TmpObjectRedraw blit offscreen buffer, draw tmp objects, draw slider
SyncRedraw (for animation)
blit offscreen buffer, except for rulers to prohibit
flickering,
draw tmp objects (the animated/active objects)
draw sliders and cursors
---
THE RACE CONDITION:
This redraw strategy has evolved during many versions of JDK 1.1.x
and seems to work most of the time. One of the most critical methods
in terms of performance is the drawGrid() method. Instead of drawing
thousands of one-pixel lines, we break the full drawing canvas into
smaller blocks, draw one block, and copy this block over the full area
using the Graphcis.copyArea() method.
However, there still seems to be an annyoing race condition inside the
AWT,
which sometimes results in a broken grid display. This bug is hard to
reproduce on Windows, but it occurs more frequently on Solaris and
Linux.
See the image 'jdkbug-grid-broken.gif' attached below: Instead of
copying
the correct (cleared) part of the offscreen image, the copyArea() method
has copied parts of the offscreen image with data in it...
The second image attached to this mail shows another screenshot with an
apparent AWT race condition with broken colors: for some reasons inside
the AWT, the central object is filled with cyan color instead of white.
---
THE STATUS CANVAS:
The second source file attached contains the implementation of the
JavaFIG status line. Because of resize-bugs in AWT 1.0 we could not use
class java.awt.Label, but had to provide our own class,
based on java.awt.Canvas.
The class also manages a 'history stack' for the messages, which is used
for pushing/popping 'bubble help' status messages when the user moves
over one of the command buttons.
---
THE DEADLOCK:
Usually, the status message is redrawn from time to time only, as a
reaction to user commands or mouse movements (like the bubble help
messages).
However, the status message canvas is also used to display status
messages from the parser, when reading in a FIG file - typically
one message per parsed object. This in turn results in a very high
number of repaint requests up to several hundred repaint()s per second
on fast computers. Note that the parser is running as a seperate thread,
in order to keep repaints and user interaction alive during parsing.
If we disable the parser status messages, Windows does not deadlock.
If we reduce the number of status messages (e.g., one per 100 parsed
objects), the deadlock probability is reduced, but still not zero.
This behaviour led to my original hypothesis, that AWT/Windows does
not 'like' hundreds of repaints per second. However, with a reduced
(but non-zero) frequency of repaints, the deadlock still occurs.
Another reason for the deadlock might be the synchronisation when
accessing the message stack (push() and pop() in class java.util.Stack
are synchronized) from both the parser thread and the redraw thread or
the AWT event queue thread. However, AFAIK, there are no status message
requests from the repaint thread, and we did not find any when trying
to protocol them (see the 'printThread() method in
GE_statusCanvas.java).
Debugging under Windows is quite difficult, because once the deadlock
occurs, the debugger is dead, too...
A second scenario which triggers the deadlock (on Solaris and Linux,
too)
is the 'BreakCompoundCommand' command. However, when running inside
the debugger, the deadlock is very hard to reproduce. Therefore, even
in this case we have no idea as to where the deadlock occurs.
---
SUMMARY:
- The deadlock occurs only in the full-featured application.
- We have two custom-GUI classes, both inheriting from java.awt.Canvas:
javafig.canvas.GE_canvas (and subclasses like EditorCanvas)
javafig.gui.GE_statusCanvas
- the deadlock occurs with repaints() pending to both the status canvas
and the main canvas
- the deadlock occurs very 'fast', when requesting frequent repaints()
from the status canvas, especially when calling
GE_statusMessage.repaint()
from a separate threaad (the parser)
- the deadlock may be due to synchronized accesses to the
java.util.Stack
used inside GE_statusCanvas. However, we cannot debug this on Windows
(system hangs completely), and there is no indication for this
scenario
on other platforms.
- the grid redraw algorithm used in JavaFIG works most of the time,
but sometimes the grid is not cleared correctly inside the AWt.
We suppose a 'race condition' inside the AWT when accessing and using
offscreen buffers (java.awt.Image) together with Graphics.copyArea()
---
Hope that helps! Please feel free to point out any bugs or abuse of
Java class libraries to us, and don't hesitate to request more info.