Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4533667

Vtest hang where the vm thread is stuck on the malloc lock

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.1
    • 1.4.0
    • hotspot
    • hopper
    • generic
    • generic


      VTest hang after 25 hours on machine jtg-e450.sfbay with Merlin build 88

      the vm thread is stuck on the malloc lock
      and several threads have been interrupted by sigusr1 while doing a malloc.

      current thread: t@4
      =>[1] __lwp_sema_wait(0xfc781e30, 0x0, 0x0, 0x0, 0x0, 0x2), at 0xff319c64
        [2] _park(0xfc781e30, 0xff38e000, 0x0, 0xfc781d78, 0xfe422324, 0x0), at 0xff36
      97f4
        [3] _swtch(0xfc781d78, 0xfc781d78, 0xff38e000, 0x5, 0x0, 0x0), at 0xff369204
        [4] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffeff
      ff), at 0xff36ad80
        [5] _cmutex_lock(0xff33a500, 0xff, 0xff38e000, 0xff2c0f2c, 0x204, 0x0), at 0xf
      f36ab1c
        [6] malloc(0x14, 0x95740, 0x35c7d4, 0x0, 0xfe3e2000, 0xfc781470), at 0xff2c0f2
      c
        [7] os::malloc(0x14, 0x9ee50, 0x7, 0x0, 0xfc78189c, 0xfc781518), at 0xfe049740
        [8] CHeapObj::operator new(0x14, 0x27c160, 0x1fc890, 0x6, 0x290e, 0xfc781970),
       at 0xfe0496e0
        [9] CompiledCodeSafepointHandler::setup(0xfa4a44c0, 0x39c, 0xfa4a4388, 0x0, 0x
      0, 0x0), at 0xfe1e5aac
        [10] ThreadSafepointState::examine_state_of_thread(0x24c150, 0x0, 0xffffffff,
      0xfe42bc28, 0xfe422324, 0xfe1a3528), at 0xfe1a3e60
        [11] SafepointSynchronize::begin(0x5000, 0x51c4, 0x27c160, 0x24c150, 0xfe0fc6f
      4, 0x0), at 0xfe1a3580
        [12] VMThread::loop(0xfe407fec, 0xfe3f83e0, 0xfe3f83dc, 0x0, 0x0, 0x0), at 0xf
      e0fc74c
        [13] VMThread::run(0x9edc8, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfe0fc25c
        [14] _start(0x9edc8, 0xff38f6a0, 0x1, 0x1, 0xff38e000, 0x0), at 0xfe0fc16c
      current thread: t@460
        [1] __lwp_sema_wait(0xd8881e30, 0x0, 0x0, 0x1, 0x0, 0x2), at 0xff319c64
        [2] _park(0xd8881e30, 0xff38e000, 0x0, 0xd8881d78, 0x0, 0x0), at 0xff3697f4
        [3] _swtch(0xd8881d78, 0xd8881d78, 0xff38e000, 0x5, 0xd8881d78, 0xd8881200), a
      t 0xff369204
        [4] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffeff
      ff), at 0xff36ad80
        [5] _cmutex_lock(0xff33a500, 0xff, 0x0, 0xff2c1dc0, 0x0, 0x0), at 0xff36ab1c
        [6] free(0x1830d8, 0x4800, 0xfe3e2000, 0xfe3f8194, 0x2a380, 0x0), at 0xff2c1dc
      0
        [7] Thread_Interrupt_Callback::execute(0x1830d8, 0xd8880da8, 0xff38e000, 0x1,
      0x0, 0x0), at 0xfe2e22b8
        [8] OSThread::do_interrupt_callbacks_at_interrupt(0x268ae8, 0xd8880da8, 0x0, 0
      x240100, 0xfe1e62a4, 0x0), at 0xfe1e6b78
        [9] JVM_handle_solaris_signal(0x10, 0xd8881200, 0xd8880f48, 0x1, 0x0, 0x0), at
       0xfe1e6314
        [10] __sighndlr(0x10, 0xd8881200, 0xd8880f48, 0xfe1e6240, 0xd8881e10, 0xd8881e
      00), at 0xff37bd04
        [11] sigacthandler(0x10, 0xd8881d78, 0xd8880f48, 0xff38e000, 0xd8881d78, 0xd88
      81200), at 0xff378508
        ---- called from signal handler with signal 16 (SIGUSR1) ------
        [12] _mutex_adaptive_lock(0xff399944, 0x66666400, 0x4c00, 0x1, 0x4d58, 0xfffef
      fff), at 0xff36ada4
        [13] _cmutex_lock(0xff33a500, 0xff, 0xfe4, 0xff2c0f2c, 0x19a, 0xfa4dcf7c), at
      0xff36ab1c
        [14] malloc(0x8c, 0x0, 0x19f, 0xfa4dcf7c, 0xfe3e2000, 0x0), at 0xff2c0f2c
        [15] os::malloc(0x8c, 0xfa4dc848, 0x31ea58, 0xd8881858, 0x2e2a38, 0xfe10ae00),
       at 0xfe049740
        [16] CHeapObj::operator new(0x8c, 0xfe3e2000, 0x0, 0x2686dc, 0xfe3e2000, 0xfa4
      43370), at 0xfe0496e0
        [17] nmethod::add_handler_for_exception_and_pc(0xfa4dc848, 0xd8881540, 0xfa4dc
      a30, 0xfa4dcb10, 0x268110, 0xee0a3e48), at 0xfe1ae30c
        [18] Runtime1::exception_handler_for_pc(0xfe4009fc, 0x2686a0, 0xfa4dca30, 0xe,

      Date: Fri, 30 Nov 2001 14:40:03 -0500 (EST)
      From: Karen Kinnear - Sun PC Distributed Systems <###@###.###>
      Subject: Re: New Volano Hang
      To: ###@###.###, ###@###.###
      Cc: ###@###.###, ###@###.###, ###@###.###, ###@###.###
      MIME-Version: 1.0
      Content-MD5: EsSXBTLHb1pIPLxK2NEDIQ==


      Coleen -

      Dice pointed out a potential bug here when he reviewed my
      latest change (not caused by that latest change).

      The thread getting the SIGUSR1 is deadlocking itself I believe,
      i.e. I think it has the malloc lock and is now servicing an interrupt
      callback.

      The Thread_Interrupt_Callback::execute call actually
      tries to free the callback by calling

      "delete this"

      There is a comment saying there is not an issue with deadlock with
      a malloc lock, but I'm not sure why that would be true.

      There are a number of ways to fix this. When this no longer
      longjmps I will make this synchronous and have the caller do
      the freeing.

      Steve would know better - I think there used to be a list
      of cancelled requests - perhaps we could have a list of
      completed requests. Alternatively we could add a bit or
      something to mark the request as completed. Steve would know
      what code would delete it later (perhaps the next synchronous
      call that is deleting a request for this thread and already
      removing its own completed request).

      hope this helps,
      Karen


      ###@###.### 2001-11-30

            acorn Karen Kinnear (Inactive)
            jzhongsunw June Zhong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: