Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4211948

Windows freezes (complete) due to deadlock in AWT repaint (JDK 1.2 final)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P4 P4
    • None
    • 1.2.0
    • client-libs
    • x86
    • windows_98

      Name: gsC80088 Date: 02/16/99


      0. Problem description: Windows hangs (completely) as a result
         of an AWT 1.2 deadlock. The problem has been reproduced on
         several Windows 95 / Windows 95B OSR2 / Windows 98 machines,
         with several graphics adapters, motherboards, system configurations.

         Because the deadlocks occurs in either the paint() method
         or the Windows event dispatch, Windows FREEZES totally
         (only the mouse still moves).
         Sometimes, it is possible to call the Windows task-manager,
         but in most cases the system needs a REBOOT.
        
         We could not reproduce the problem under Windows'NT, but we
         cannot guarantee that the bug won't occur unter NT.

         JavaFIG runs fine with JDK 1.1.5/6/7 on Windows 95/95B/98/NT,
         Linux, Solaris, Microsoft VM 3.0/3.1 on Windows 95/95B/98/NT,
         and JDK 1.2 on Solaris.

      1. Reproduce the problem:
         1.1) Download the current version of JavaFIG (1.33) from
         http://tech-www.informatik.uni-hamburg.de/applets/javafig/

         1.2) Install the JAR archive to your classpath, or unpack the
         archive.

         1.3) Run the JAR archive with JDK 1.2,
         java -jar javafig-classes.zip

         or start the editor or the viewer
         java javafig.gui.Editor
         java javafig.gui.Viewer

         1.4) In the editor, load the demos (help->demos->demo xx),
         or load some files. Depending on your system config and
         speed, the JVM will deadlock during repaints to the status
         message panel. Sometimes, this may take several tries,
         but sometimes, even the first load may fail.

         1.5) The system freezes. Running in the debugger doesn't
         help, because the debugger console is dead too, after the
         bug hits.

      2) You can get the JavaFIG source code, if you really want it.
         
      3) No error messages. The JVM simply deadlocks.

      4) No trace info available.
         If we disable the status messages (and send them to System.out
         instead), everything works ok.
         Therefore, the problem is either 1) concurrent repaints originating
         from multiple threads, or 2) too frequent repaints

      5) The problem occurs only with JDK 1.2 final. 1.2beta4 was
         fine, as are 1.1.5/6/7.

         There is probably no hardware dependency. The same systems
         that instantly deadlock with JDK 1.2 run rock-solid with
         JDK 1.1.7, Microsoft SDK 3.1, and all our other Windows
         programs and networking.
      (Review ID: 52574)
      ======================================================================

      gstone@eng 1999-02-16

      I asked the user for some code and more info and he's provided it. I've attached the files (in a winzip file), and here are his remarks:


      thanks for your reply to my bug report.

      Unfortunately, so far we cannot reproduce the bug in small standalone
      apps or applets. Also, we have not found a workaround yet (except to
      run JDK 1.1.7).


      ---

      CONTENTS OF THIS MAIL:

      As an attachment, I send you all the source files from JavaFIG that
      I suspect. Please excuse that the code is not as beautiful as it should
      be, but it contains a lot of debug/workaround code assembled during
      testing with JDK 1.1.beta up to JDK 1.2 final...

      Please tell us, if you should find some obvious bugs in those 'critical'
      parts of our code. If not, I will send you the full source code on
      request; however, as it consists of more than 100 classes with quite
      complex interaction, I did not include all of it this time.

      Also, I attach two screenshots (GIF images) that illustrate another
      thread interaction bug that may well be related to the deadlocks we
      experience.

      ---

      ABOUT JAVAFIG REDRAWING:

      In the following, I will try to explain the basic redrawing and thread
      interaction (as I see it). If you find an obvious bug in our code,
      please tell us quick.

      JavaFIG is intended as a full replacement for the Unix/X11
      graphics/diagram
      editor FIG. Except for some limitations of the Java 1.1 AWT (like
      rotated
      texts of fill patterns), most of the FIG functions are there.

      For historical reasons, some classes are called 'GE_xxx' for 'Graphics
      Editor class xxx', and we don't (yet) use the standard Java naming
      convention,
      as de.uni-hamburg.informatik.javafig.xxx would be very long.

      In order to realize functions like zooming, panning, rubberbanding, and
      the magnetic grid, FIG is built from four Java packages:

      1) graphical objects, like polylines, rectangles, text, splines, ...
      2) a canvas to display the objects
      3) an editor to manage objects, GUI and user interaction
      4) command objects for one edit-scenario each

      Perhaps the most critical (in terms of deadlock) part is the drawing
      canvas, class javafig.canvas.GE_canvas.
      To improve performance, GE_canvas manages an offscreen buffer image.
      Some drawing operations are performed directly on the front buffer
      Graphics
      object, other on the offscreen buffer Graphics.

      We have several types of redrawing, see method GE_canvas.handleRedraw().
      Internal flags decide whether to perform a full redraw, to redraw the
      grid and the objects, to redraw for rubberbanding, or simply to bitblit
      the offscreen buffer to the screen.

      Mode: Action:
      NoChanges redraw sliders and cursor
      MouseMotionRedraw redraw sliders and cursor
      FullRedraw initialize (allocate) the offscreen buffer
                          clear the offscreen buffer
                          draw the rulers
                          draw the grid
                          draw all objects
                          blit the offscreen buffer to the screen
                          draw any tmp objects
                          draw sliders and cursor
      TmpObjectRedraw blit offscreen buffer, draw tmp objects, draw slider
      SyncRedraw (for animation)
                          blit offscreen buffer, except for rulers to prohibit
                          flickering,
                          draw tmp objects (the animated/active objects)
                          draw sliders and cursors


      ---

      THE RACE CONDITION:

      This redraw strategy has evolved during many versions of JDK 1.1.x
      and seems to work most of the time. One of the most critical methods
      in terms of performance is the drawGrid() method. Instead of drawing
      thousands of one-pixel lines, we break the full drawing canvas into
      smaller blocks, draw one block, and copy this block over the full area
      using the Graphcis.copyArea() method.

      However, there still seems to be an annyoing race condition inside the
      AWT,
      which sometimes results in a broken grid display. This bug is hard to
      reproduce on Windows, but it occurs more frequently on Solaris and
      Linux.
      See the image 'jdkbug-grid-broken.gif' attached below: Instead of
      copying
      the correct (cleared) part of the offscreen image, the copyArea() method
      has copied parts of the offscreen image with data in it...

      The second image attached to this mail shows another screenshot with an
      apparent AWT race condition with broken colors: for some reasons inside
      the AWT, the central object is filled with cyan color instead of white.

      ---

      THE STATUS CANVAS:

      The second source file attached contains the implementation of the
      JavaFIG status line. Because of resize-bugs in AWT 1.0 we could not use
      class java.awt.Label, but had to provide our own class,
      based on java.awt.Canvas.

      The class also manages a 'history stack' for the messages, which is used
      for pushing/popping 'bubble help' status messages when the user moves
      over one of the command buttons.


      ---
      THE DEADLOCK:

      Usually, the status message is redrawn from time to time only, as a
      reaction to user commands or mouse movements (like the bubble help
      messages).
      However, the status message canvas is also used to display status
      messages from the parser, when reading in a FIG file - typically
      one message per parsed object. This in turn results in a very high
      number of repaint requests up to several hundred repaint()s per second
      on fast computers. Note that the parser is running as a seperate thread,
      in order to keep repaints and user interaction alive during parsing.

      If we disable the parser status messages, Windows does not deadlock.
      If we reduce the number of status messages (e.g., one per 100 parsed
      objects), the deadlock probability is reduced, but still not zero.

      This behaviour led to my original hypothesis, that AWT/Windows does
      not 'like' hundreds of repaints per second. However, with a reduced
      (but non-zero) frequency of repaints, the deadlock still occurs.

      Another reason for the deadlock might be the synchronisation when
      accessing the message stack (push() and pop() in class java.util.Stack
      are synchronized) from both the parser thread and the redraw thread or
      the AWT event queue thread. However, AFAIK, there are no status message
      requests from the repaint thread, and we did not find any when trying
      to protocol them (see the 'printThread() method in
      GE_statusCanvas.java).

      Debugging under Windows is quite difficult, because once the deadlock
      occurs, the debugger is dead, too...


      A second scenario which triggers the deadlock (on Solaris and Linux,
      too)
      is the 'BreakCompoundCommand' command. However, when running inside
      the debugger, the deadlock is very hard to reproduce. Therefore, even
      in this case we have no idea as to where the deadlock occurs.


      ---
      SUMMARY:

      - The deadlock occurs only in the full-featured application.

      - We have two custom-GUI classes, both inheriting from java.awt.Canvas:
        javafig.canvas.GE_canvas (and subclasses like EditorCanvas)
        javafig.gui.GE_statusCanvas

      - the deadlock occurs with repaints() pending to both the status canvas
        and the main canvas

      - the deadlock occurs very 'fast', when requesting frequent repaints()
        from the status canvas, especially when calling
      GE_statusMessage.repaint()
        from a separate threaad (the parser)

      - the deadlock may be due to synchronized accesses to the
      java.util.Stack
        used inside GE_statusCanvas. However, we cannot debug this on Windows
        (system hangs completely), and there is no indication for this
      scenario
        on other platforms.

      - the grid redraw algorithm used in JavaFIG works most of the time,
        but sometimes the grid is not cleared correctly inside the AWt.
        We suppose a 'race condition' inside the AWT when accessing and using
        offscreen buffers (java.awt.Image) together with Graphics.copyArea()



      ---

      Hope that helps! Please feel free to point out any bugs or abuse of
      Java class libraries to us, and don't hesitate to request more info.

            dmendenhsunw David Mendenhall (Inactive)
            gstone Greg Stone
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: