Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-1260305

Crash caused between odd interaction between threads and Garbage collection.

XMLWordPrintable

    • 1.1
    • other, sparc
    • generic, solaris_1, solaris_2.5
    • Not verified

      ocated @ javaserv2:/export/root/JavaDesktop/kona.1.0.kI_1.Z

      Ctrace file:../traces/java-6.ctrace_1.0.kI.Aug12.10:15:31

       ctrace
      PC: ffd4077c
      Last leaf: jmpl f81f5060 from f801e4e0
           0 w %o0-%o5: ( f81f5060 ffeff000 0 71e1000c 80000000 0 )

      call f801e4d0 from f8026294
           1 w %o0-%o5: ( 0 f8602000 f8045440 f801c6a8 73 0 )

      call f802610c from f801f7c8
           2 w %o0-%o5: ( f8045444 f8045440 d f801c6a8 73 0 )

      call f801de1c from f801da38
           3 w %o0-%o5: ( 0 e02e9d58 18 7efefeff 81010100 ff0000 )

      call f804541c from f800c460
           4 w %o0-%o5: ( e03ac930 f81ef9a8 e03ac908 f801c6a8 690066 0 )

      jmpl f800c444 from f8017ee4
           5 w %o0-%o5: ( e03ac920 0 19 7efefeff 81010100 ff00 )

      call f8017ecc from f800c4b4
           6 w %o0-%o5: ( f800c444 0 c 7efefeff 81010100 ff0000 )

      More
       [<space>,<cr>,q] ?
                              
      call f800c470 from f801c278
           7 w %o0-%o5: ( 21 1 f8602000 f8601800 81010100 ff0000 )

      call f801c210 from f80262a0
           8 w %o0-%o5: ( 1 f8602000 f8045cac 0 0 0 )

      call f802610c from f801f7c8
           9 w %o0-%o5: ( 9 f8045cac e02f80a0 f863c800 0 e02ea128 )

      call f80457fc from f8045ca0
           a w %o0-%o5: ( 0 f80ba454 f81f1730 f81f16e0 0 0 )

      call f8045c6c from f8019728
           b w %o0-%o5: ( 2 e0002bd8 f81f1730 f81f16e0 e0002bd8 e0030d10 )

      jmpl f80195e4 from f8045b1c
           c w %o0-%o5: ( f81f1400 f8602400 0 1388 e02f1564 f816ca10 )

      ???? from f8045ad8
           d w %o0-%o5: ( 1 e0002bd8 f80195e4 0 0 0 )

      ok

      Trace file: ../traces/uniq/java-6.trace_1.0.kI.Aug12.10:15:31

      jmpl %o0, %o7
      call kona_enter_mon
      call trap
      call sysWrite
      call sysMonitorDumpInfo
      jmpl %i0, %o7
      call registeredEnumerate_unlocked
      call DumpMonitors_unlocked
      call abort
      call trap
      call _sched_lock
      call sysThreadExit
      jmpl %i2, %o7
      save %sp, -0x70, %sp


      Snmp Dump: Not available

      The description field as copied from bug report 1262836 follows:

      This is not reproducable, but I had two windows, in oneI had the home page, in the
      other I had the thread monitor, when I tried to run the "Show All Pages"
      applet and got the following crash:

      : laoshu; less tip.960808
       
      ...
       
      ProcStackSize = 24576
      malloc_chunk = 147480
      initial_malloc_chunk = 1179840
      Objects: 3414032 bytes, Handles: 426760 bytes, GCbits: 120028 bytes
       
       
      Hullo from sun.kona.Main.main!
       
      Env.buildNamespaces(): Building Namespaces!
      Env.buildNamespaces(): Built Namespaces!
      Kona Version: 1.0.kI
       
      ...
       
      sysUnmapMem 0xe0304570
      sysUnmapMem 0xe0304570
      Fault 0x9 at PC 0xf8045f20
      AFSR 1860000
      AFAR 71d10000
      MFSR 18000b0
      MFAR 24231ac
      SFSR 326
      SFAR 10
      Aborting...
       
      I have the tip log and binary of kona in ~madany/BUGS/*.960808

      -peter madany 8/8/96
      I am entering this bug as something I hope we can solve mostly by insipration
      and not persperation. FTP has encountered one of ugliest bugs I have
      ever seen. They have a multi-threaded application an agent manager. It
      uses at times over a hundred threads. It has run remarkably well. However,
      during stress testing of the application, the Virtual machine will crash.

      The circumstances are brutal. It runs anywhere from 1-10 hours and then
      the crash occurs....sometimes. I have brought their application in house
      and have reproduced the problem. Unfortunately, their application is a
      very complicated to run and it took about a day a half to get things set up.
      Attempts to narrow this down to a small test case have not been fruitful.

      Through some trial and error ftp and I were able to get the time
      between crashes up to 40 hours. This allowed them to ship so the
      urgency is not as great. However, the bug still exists. I have
      included several stack traces from the point of failure. In addition,
      I have included some notes Tim Lindholm sent me after having a hallway
      conversation after a look at the code. After talking with Tim, I have
      some code changes I am going to make and a possible work around for the
      customer. I creating this rambling note and placing it in bugtraq so
      we will have a record of the problem in case anybody stumble on it.

      Tim will have to confirm that these thoughts are still relevant after
      looking at the included stack traces....



      To: ###@###.###, thomas.kincaid@East
      Cc: ###@###.###, Jim.Graham@Eng
      Subject: Notes on FTP crash
      Date: Wed, 17 Jul 1996 19:32:49 -0700

      Hi Tom, here are a few half-baked notes mainly to remind me about a
      possible cause of the bug. If you report the bug in the customer's name
      and put me on as its owner, I'll add these details to the evaluation.

      In the Green threads implementation of sysThreadExit() there is the
      code:

          ...
          removeFromActiveQ(tid);

          /*
           * WARNING: DO NOT ATTEMPT TO ENTER ANY MONITORS AFTER THIS POINT
           * (unless you are reentering something you are *guaranteed* to
           * already own).
           */
          SCHED_LOCK();
          ...
          thread_handle = (TID) tid->cookie;
          ee = (ExecEnv *) THREAD(thread_handle)->eetop; <<< site of crash?
          ...

      The call to removeFromActiveQ removes an exiting thread from the active
      thread queue (whatever that is called). The active thread queue is
      known about specially by the garbage collector, and ensures that a
      thread object will not go away whether or not a user drops the last
      reference to it, at least until the thread is stopped or exits on its
      own.
          
      I think there is a race condition at least between the removeFromActiveQ
      call and the SCHED_LOCK. If this thread's exit is interrupted at that
      point and a garbage collection happens, the exiting thread may be
      collected: the user may have dropped all references, it's not on the
      active thread queue, and the thread_handle variable on the C stack
      won't have been set yet. I think that the same race exists on the
      PC and on native Solaris threads, and is somewhat more likely to be hit
      there given their's time-slicing.

      The customer could work around the bug by keeping references to threads
      in Java until Thread.isAlive fails (meaning sysThreadExit has got past
      the race). To fix the bug we might be able to lock the thread queue
      around removal and setting of thread_handle, or reorder setting of
      thread_handle and removal from the queue.

      -- Tim




      Stack traces
      ------------------------------------

      With -debug set:

      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) where
        [1] kill(0x0, 0x6, 0x0, 0xef691448, 0x18, 0xef638ed4), at 0xef6740d8
        [2] abort(0xf5d22f34, 0xf0279b08, 0x400100, 0x0, 0x0, 0x0), at 0xef63a530
        [3] sysAbort(), line 164 in "system_md.c"
      =>[4] scanThread(t = 0xedba0de0, arg = 0xee308888), line 1048 in "gc.c"
        [5] sysThreadEnumerateOver(func = &`java_g`gc.c`scanThread(struct sys_thread *t, void *arg) at 0x3d760, arg = 0xee308888), line 606 in "threads_md.c"
        [6] scanThreads(), line 1177 in "gc.c"
        [7] gc0_locked(async_call = 0, free_space_goal = 1256U), line 1767 in "gc.c"
        [8] gc0(async_call = 0, free_space_goal = 1256U), line 1852 in "gc.c"
        [9] manageAllocFailure(n = 1256, overflow_type = 1, overflow_act = 1), line 840 in "gc.c"
        [10] realObjAlloc(mptr = 0x4de5, n0 = 1246), line 296 in "gc.c"
        [11] ArrayAlloc(t = 5, l = 623), line 429 in "gc.c"
        [12] ExecuteJava(initial_pc = 0xede70086 "\\x30000057", ee = 0xede70ccc), line 2579 in "interpreter.c"
        [13] do_execute_java_method_vararg(ee = 0xede70ccc, obj = 0xee3120e8, method_name = 0xba7f3 "<init>", method_signature = 0xbaa63 "([C)V", mb = 0xbad88, isStaticCall = FALSE, args = 0xede702b0, otherBits = (nil), shortFloats = FALSE), line 529 in "interpreter.c"
        [14] execute_java_constructor(ee = 0xede70ccc, classname = 0xba8e5 "java/lang/String", cb = 0xb42b0, signature = 0xa8194 "([C)", ...), line 345 in "interpreter.c"
        [15] makeJavaString(str = 0x2e2eb8 " Receiving agent...\\nLauncher is busy\\nmessage:CyberAgent Manager,log saved,###@###.###,Jun 22 18:20:27 EDT 1996, \\n Agent [SystemAgent] sent\\n Agent [SystemAgent] received and accepted\\nSTART SAVING AT 1\\nNumb of outgoing agents = 1\\nsending: sampleAgent null to ###@###.###,\\n Sending agent [sampleAgent]\\nNumber of Received entries = 1\\nnumber of AgentCommInterface = 1\\nAgent Thread: SystemAgent\\nAgent Type: report\\nNumb of running agents = 1\\nNumber of log entries = 1\\nDelivering report" ..., len = 623), line 40 in "string.c"
        [16] sun_awt_motif_MTextAreaPeer_getText(this = 0xee300328), line 347 in "awt_TextArea.c"
        [17] Java_sun_awt_motif_MTextAreaPeer_getText_stub(_P_ = 0x1f7730, _EE_ = 0xede70ccc), line 882 in "libstubs.c"
        [18] invokeNativeMethod(o = 0xee300328, mb = 0x1eb3a8, args_size = 1, ee = 0xede70ccc), line 194 in "classruntime.c"
        [19] ExecuteJava(initial_pc = 0xede70ba6 "\\x3000005a", ee = 0xede70ccc), line 2434 in "interpreter.c"
        [20] do_execute_java_method_vararg(ee = 0xede70ccc, obj = 0xee308888, method_name = 0xa827c "run", method_signature = 0xa8280 "()V", mb = (nil), isStaticCall = FALSE, args = 0xede70cac, otherBits = (nil), shortFloats = FALSE), line 529 in "interpreter.c"
        [21] execute_java_dynamic_method(ee = 0xede70ccc, obj = 0xee308888, method_name = 0xa827c "run", signature = 0xa8280 "()V", ...), line 224 in "interpreter.c"
        [22] ThreadRT0(p = 0xee308888), line 61 in "thread.c"
        [23] start_func(dummy = 1, arg = -298809208, func = &`java_g`thread.c`ThreadRT0(Hjava_lang_Thread *p) at 0x65500), line 195 in "threads_md.c"
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) list
       1058 }
       1059 #endif /* TRACEGC */
       1060 /* Mark thread object */
       1061 MarkPtr(p, HardMark);
       1062
       1063 base = threadStackBase((TID)p);
       1064 ssc = threadStackPointer((TID)p);
       1065
       1066 if (p == me) {
       1067 ssc = (unsigned char **) &me;
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) where
        [1] kill(0x0, 0x6, 0x0, 0xef691448, 0x18, 0xef638ed4), at 0xef6740d8
        [2] abort(0xf5d22f34, 0xf0279b08, 0x400100, 0x0, 0x0, 0x0), at 0xef63a530
        [3] sysAbort(), line 164 in "system_md.c"
      =>[4] scanThread(t = 0xedba0de0, arg = 0xee308888), line 1048 in "gc.c"
        [5] sysThreadEnumerateOver(func = &`java_g`gc.c`scanThread(struct sys_thread *t, void *arg) at 0x3d760, arg = 0xee308888), line 606 in "threads_md.c"
        [6] scanThreads(), line 1177 in "gc.c"
        [7] gc0_locked(async_call = 0, free_space_goal = 1256U), line 1767 in "gc.c"
        [8] gc0(async_call = 0, free_space_goal = 1256U), line 1852 in "gc.c"
        [9] manageAllocFailure(n = 1256, overflow_type = 1, overflow_act = 1), line 840 in "gc.c"
        [10] realObjAlloc(mptr = 0x4de5, n0 = 1246), line 296 in "gc.c"
        [11] ArrayAlloc(t = 5, l = 623), line 429 in "gc.c"
        [12] ExecuteJava(initial_pc = 0xede70086 "\\x30000057", ee = 0xede70ccc), line 2579 in "interpreter.c"
        [13] do_execute_java_method_vararg(ee = 0xede70ccc, obj = 0xee3120e8, method_name = 0xba7f3 "<init>", method_signature = 0xbaa63 "([C)V", mb = 0xbad88, isStaticCall = FALSE, args = 0xede702b0, otherBits = (nil), shortFloats = FALSE), line 529 in "interpreter.c"
        [14] execute_java_constructor(ee = 0xede70ccc, classname = 0xba8e5 "java/lang/String", cb = 0xb42b0, signature = 0xa8194 "([C)", ...), line 345 in "interpreter.c"
        [15] makeJavaString(str = 0x2e2eb8 " Receiving agent...\\nLauncher is busy\\nmessage:CyberAgent Manager,log saved,###@###.###,Jun 22 18:20:27 EDT 1996, \\n Agent [SystemAgent] sent\\n Agent [SystemAgent] received and accepted\\nSTART SAVING AT 1\\nNumb of outgoing agents = 1\\nsending: sampleAgent null to ###@###.###,\\n Sending agent [sampleAgent]\\nNumber of Received entries = 1\\nnumber of AgentCommInterface = 1\\nAgent Thread: SystemAgent\\nAgent Type: report\\nNumb of running agents = 1\\nNumber of log entries = 1\\nDelivering report" ..., len = 623), line 40 in "string.c"
        [16] sun_awt_motif_MTextAreaPeer_getText(this = 0xee300328), line 347 in "awt_TextArea.c"
        [17] Java_sun_awt_motif_MTextAreaPeer_getText_stub(_P_ = 0x1f7730, _EE_ = 0xede70ccc), line 882 in "libstubs.c"
        [18] invokeNativeMethod(o = 0xee300328, mb = 0x1eb3a8, args_size = 1, ee = 0xede70ccc), line 194 in "classruntime.c"
        [19] ExecuteJava(initial_pc = 0xede70ba6 "\\x3000005a", ee = 0xede70ccc), line 2434 in "interpreter.c"
        [20] do_execute_java_method_vararg(ee = 0xede70ccc, obj = 0xee308888, method_name = 0xa827c "run", method_signature = 0xa8280 "()V", mb = (nil), isStaticCall = FALSE, args = 0xede70cac, otherBits = (nil), shortFloats = FALSE), line 529 in "interpreter.c"
        [21] execute_java_dynamic_method(ee = 0xede70ccc, obj = 0xee308888, method_name = 0xa827c "run", signature = 0xa8280 "()V", ...), line 224 in "interpreter.c"
        [22] ThreadRT0(p = 0xee308888), line 61 in "thread.c"
        [23] start_func(dummy = 1, arg = -298809208, func = &`java_g`thread.c`ThreadRT0(Hjava_lang_Thread *p) at 0x65500), line 195 in "threads_md.c"
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) f 4
      f: not found
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) frame 4
      Current function is scanThread
       1048 sysAbort();
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) list 1040,1050
       1040 JHandle *me = (JHandle *) arg;
       1041 JHandle *p = (JHandle *) sysThreadGetBackPtr(t);
       1042 unsigned char **ssc, **limit;
       1043 stackp_t base;
       1044 SetLimits();
       1045
       1046 #ifdef DEBUG
       1047 if (!ValidHandle(p)) {
       1048 sysAbort();
       1049 }
       1050 #endif /* DEBUG */
      (/usr/dist/local/pkgs/sunpro/5.x-sparc/bin/dbx) c/bin/dbx) where
       > =>[1] sysThreadExit(), line 322 in "threads_md.c"
       > [2] ThreadRT0(p = 0xee317ec0), line 91 in "thread.c"
       > [3] start_func(dummy = 1, arg = -298746176, func = &`java_g`thread.c`ThreadRT0(Hjava_lang_Thread *p) at 0x65500), line 195 in "threads_md.c"
       >

      The description field as copied from bug report 1263002 follows:

      Peter was logged in to my 8MB MrCoffee and trying to get JavaOke
      up and running when the following crash happened. The JavaOke
      window had appeared, and we tried to select some music, but nothing
      was responding.

      The TCP RST output indicates that the JavaOke server was not running
      and that our attempts at connecting to the JavaOke server were being
      rejected.

      Here's the dump:
      sysUnmapMem 0xe02865d0
      sysUnmapMem 0xe0209ff8
      sysUnmapMem 0xe02135b0
      TCP: SYN_SENT got RST -- connection refused
      Fault 0x9 at PC 0xf8048264
      AFSR 1860000
      AFAR 71d00004
      MFSR 18000b0
      MFAR 4201bc
      SFSR 326
      SFAR 10
      Aborting...
       
      Full thread dump:
      "../../../../src/kona/java/green_threads/src/schedule.c", line 68: assertion _scheduling_lock == 1 failure
       
      *** panic: "../../../../src/kona/java/green_threads/src/schedule.c", line 68: assertion failure
       
      Double abort!
      Kona Postmortem: Initialized ethernet 8:0:20:7c:ba:df
      JDM1Platform
      sritchie:08/05/96-16:59
      SCCS/s.GNUmakefile: D 1.193 96/07/30 11:30:29 rberlin 297 296 00003/00003/00815
      SCCS/s.Classfiles.gmk: D 1.144 96/07/31 15:12:45 sritchie 250 249 00000/00000/00839
      Type 'go' to resume
      Type help for more information
      ok

      PC: ffd4077c
      Last leaf: jmpl f82541e0 from f801fcec
           0 w %o0-%o5: ( f82541e0 ffeff000 0 71e1000c 80000000 0 )
       
      call f8022ccc from f8023098
           1 w %o0-%o5: ( f802307c 10 240 ffd37a9c ffffffff 3d486 )
       
      call f8022d90 from f8023284
           2 w %o0-%o5: ( 2a f8626380 240 a2 20 7c )
       
      call f80230d8 from f801d968
           3 w %o0-%o5: ( 0 2 f8602000 f8601800 73 0 )
       
      call f801d8e4 from f8025b18
           4 w %o0-%o5: ( 62 e01fc228 62 7efefeff 81010100 ff00 )
       
      call f8025b14 from f801c09c
           5 w %o0-%o5: ( 0 e01fc698 f8252e5a e01fc7f8 81010100 ff00 )
       
      call f801c040 from f801d9d4
           6 w %o0-%o5: ( f8252e38 e01fc698 44 0 0 e01fcbb8 )
       
      call f801d978 from f8047a88
           7 w %o0-%o5: ( f8350448 f8350410 44 b 0 0 )
       
      call f8047a38 from f8046e58
           8 w %o0-%o5: ( 0 f8046e48 0 f81f6200 e000f840 e0010f48 )
       
      call f8046dac from f80470ec
           9 w %o0-%o5: ( e03a4af0 e03a4afc d f801dd7c 800 11 )
       
      call f80470c0 from f804888c
           a w %o0-%o5: ( e03a4af0 e01fc8c8 13 7efefeff 81010100 ff00 )
       
      call f8048878 from f800d210
           b w %o0-%o5: ( 0 0 c 7efefeff 81010100 ff0000 )
       
      call f800d1f0 from f801d944
           c w %o0-%o5: ( 2f 1 f8602000 f8601800 81010100 ff0000 )
       
      call f801d8e4 from f8027c9c
           d w %o0-%o5: ( 1 f8602000 f8048264 0 0 0 )
       
      call f8027b08 from f8021014
           e w %o0-%o5: ( 9 f8048264 e020ff58 f863c400 0 e01fcbb8 )
       
      call f8047d40 from f8048258
           f w %o0-%o5: ( 0 f80a7f38 f81f6250 f81f6200 0 0 )
       
      call f8048224 from f801ad84
          10 w %o0-%o5: ( f81f6000 e000f840 f81f6250 f81f6200 e000f840 e0010f48 )
       
      jmpl f801ac10 from f80480d4
          11 w %o0-%o5: ( f81f6000 f8602400 0 c8 e02ddecc f816fda0 )
       
      ???? from f8048090
          12 w %o0-%o5: ( 1 e000f840 f801ac10 0 0 0 )
       
      XXXXXXX from 11
          13 w %o0-%o5: ( e0042460 0 e0032e70 11 7 264 )
       
      XXXXXXX from a2
          14 w %o0-%o5: ( 0 f8086560 0 f8086560 e00a75d8 f80717a0 )
       

      The full crash log is in ~sritchie/crashes/crash6. The kona binary is
      ~sritchie/crashes/kona.crash6.

      sritchie -- 9 Aug 96

      The description field as copied from bug report 1263236 follows:

      Reproduability: Hard to reproduce as the applet http://physics.www.media.mit.edu/~metois/MyJava/WebMusic/choir.html
      ran for 64 hours and then crashed


      Kona Version: 1.0.kI
      Kona Builder and Date: SMCC:08/09/96-04:05
      Kona GNUMakefile SCCS: 1.196
      Kona Classfiles.gmk SCCS: 1.147
      Kona Platform is : JavaDesktopMachine

      Kona image is l

            tlindholsunw Timothy Lindholm (Inactive)
            tkincaidsunw Tom Kincaid (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: