Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4831461

JNI GetEnv() race with ThreadLocalStorage::set_thread() guarantee

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P4
    • 1.4.2_02
    • 1.3.1_13, 1.4.1, 1.4.2
    • hotspot
    • None
    • 02
    • sparc
    • solaris_8

    Backports

      Description

        ###@###.### 2003-03-12

        Stress testing of the AsyncGetCallTrace() interface has exposed an
        intermittent guarantee failure:

        # guarantee(get_thread() == thread, "must be the same thread, quickly")
        # Error ID : src/share/vm/runtime/threadLocalStorage.cp, 66

        The AsyncGetCallTrace() specific portion of the failure is tracked by
        the following bug id:

        4822846 4/4 AsyncGetCallTrace: guarantee(get_thread() == thread, "must be the
                    same thread...

        This same guarantee failure can be triggered by calling JNI GetEnv()
        and that failure mode is tracked by this bug.

        When the WatcherThread exits, the ThreadLocalStorage reference for
        exiting WatcherThread is set to NULL. However, a JNI GetEnv() call
        causes the cached thread info to be refreshed and we fail the guarantee.
        I don't have a java_g cmd stack trace for this crash.

        The failure reproduces with a less informative message with the java cmd:

        # Error ID: 5448524541442C4F43414C33544F524147450E4350500042

        The ErrorID maps to: threadLocalStorage.cpp, 66

        Here is a snippet of the stack trace for the java cmd crash:

          [6] report_fatal(0xfefdbc65, 0x42, 0xfefdbcb4, 0xfeff4000, 0xfa17fef0, 0x0), a
        t 0xfee34a7c
          [7] ThreadLocalStorage::set_thread(0x0, 0xffffffc4, 0xff013178, 0xffff8000, 0x
        7f, 0x0), at 0xfec522a4
          [8] _start(0xe3010, 0xfebf0c00, 0x0, 0x0, 0x0, 0x0), at 0xfecc5e4c

        The complete stack trace for the java crash is attached as threads.log.486.


        ###@###.### 2003-03-12 (update 1)

        I've attached a complete stack trace for a java crash with JDK1.4.1
        and the attached test program (PepTest run) as threads.log.18. Here
        is part of the stack trace for the main thread which was exiting:

          [6] report_fatal(0xfefffd13, 0x3a, 0xfefffd5c, 0x80, 0x2bbc0, 0x0), at 0xfee5a
        f64
          [7] ThreadLocalStorage::set_thread(0x0, 0x1, 0x285614, 0x5800, 0xff010000, 0x0
        ), at 0xfec52340
          [8] Thread::delete_thread_and_TLS_current(0x2c280, 0x0, 0x0, 0x0, 0xbb178, 0xf
        fbee954), at 0xfed8aa74
          [9] jni_DetachCurrentThread(0xff023598, 0x2ccbc, 0xbb178, 0x2ccac, 0x2c30c, 0x
        f2c450b8), at 0xfed89eb4


        ###@###.### 2003-04-04

        Some additional variations on the race have shown up during Java2Demo
        stress testing. complete stack traces are attached

        JNI GetEnv() calls src/os/solaris/vm/thread_solaris.inline.hpp:
        ThreadLocalStorage::thread(). If the thread is not in the cache, it
        calls ThreadLocalStorage::get_thread_via_cache_slowly() which calls
        ThreadLocalStorage::get_thread_slow(). The thread value we get back
        fails the guarantee() call.

        Here is a snippet of the stack trace:

          [7] report_fatal(0xfefde057, 0x2a, 0xfefde0a6, 0xfe990000, 0x1bffb8, 0xfef73d0
        8), at 0xfee35d4c
          [8] ThreadLocalStorage::get_thread_via_cache_slowly(0xfebf1800, 0x67, 0xfebf18
        00, 0x1, 0x0, 0x0), at 0xfef73d34
          [9] jni_GetEnv(0xff009694, 0xf13ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
        d7e4dc
          [10] profhandler(0x1d, 0xf13ffde0, 0xf13ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
          [11] __sighndlr(0x1d, 0xf13ffde0, 0xf13ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff37
        4c8c
          [12] call_user_handler(0xfebf1800, 0x31, 0xff387b20, 0xf13ffb28, 0xf13ffde0, 0
        x1d), at 0xff36fadc
          [13] sigacthandler(0xfebf1800, 0xf13ffde0, 0xf13ffb28, 0xff386000, 0xf13ffde0,
         0x1d), at 0xff36fca8
          ---- called from signal handler with signal -21030912 (SIG-21030912) ------
          [14] ThreadLocalStorage::set_thread(0x0, 0x1, 0x546ba8, 0x5359c0, 0x31fc68, 0x
        fecd6294), at 0xfec52418
          [15] JavaThread::run(0xad26e0, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
        at 0xfecd6358

        The complete stack trace is attached as threads.log.65. For the above
        guarantee failure, the message usually fails to print. I have seen
        SIGBUS, SIGILL and SIGSEGV during the print attempt.

        In this example, JNI GetEnv() gets SIGILL. Here is a snippet of the
        stack trace:

          ---- called from signal handler with signal -21029888 (SIG-21029888) ------
          [6] 0x443430(0x443438, 0x1, 0xfebf1c00, 0x1, 0x0, 0x0), at 0x44342f
          [7] jni_GetEnv(0xff009694, 0xf11ff960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
        d7e4fc
          [8] profhandler(0x1d, 0xf11ffde0, 0xf11ffb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
          [9] __sighndlr(0x1d, 0xf11ffde0, 0xf11ffb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
        c8c
          [10] call_user_handler(0xfebf1c00, 0x1a, 0xff387b20, 0xf11ffb28, 0xf11ffde0, 0
        x1d), at 0xff36fadc
          [11] sigacthandler(0xfebf1c00, 0xf11ffde0, 0xf11ffb28, 0xff386000, 0xf11ffde0,
         0x1d), at 0xff36fca8
          ---- called from signal handler with signal -21029888 (SIG-21029888) ------
          [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x3f6730, 0xa6498, 0x31fc68, 0xf
        ecd6294), at 0xfec5242c
          [13] JavaThread::run(0x443438, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
        at 0xfecd6358

        The complete stack trace is attached as threads.log.66.

        In this example, JNI GetEnv() gets SIGSEGV. Here is a snippet of the
        stack trace:

          ---- called from signal handler with signal -21028864 (SIG-21028864) ------
          [6] 0x0(0x731618, 0x1, 0xfebf2000, 0x1, 0x0, 0x0), at 0xffffffffffffffff
          [7] jni_GetEnv(0xff009694, 0xf0d7f960, 0x10002, 0x0, 0x0, 0xfcfb1988), at 0xfe
        d7e4fc
          [8] profhandler(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0x0, 0x0, 0x0), at 0xfcfa0c3c
          [9] __sighndlr(0x1d, 0xf0d7fde0, 0xf0d7fb28, 0xfcfa0b30, 0x0, 0x0), at 0xff374
        c8c
          [10] call_user_handler(0xfebf2000, 0x25, 0xff387b20, 0xf0d7fb28, 0xf0d7fde0, 0
        x1d), at 0xff36fadc
          [11] sigacthandler(0xfebf2000, 0xf0d7fde0, 0xf0d7fb28, 0xff386000, 0xf0d7fde0,
         0x1d), at 0xff36fca8
          ---- called from signal handler with signal -21028864 (SIG-21028864) ------
          [12] ThreadLocalStorage::set_thread(0x0, 0x1, 0x5bb908, 0x2dfbc8, 0x31fc68, 0x
        fecd6294), at 0xfec52418
          [13] JavaThread::run(0x731618, 0xffffffc4, 0xff015218, 0xffff8000, 0x0, 0x0),
        at 0xfecd6358

        The complete stack trace is attached as threads.log.98.

        In this example, we receive the SIGPROF signal while we are executing
        the "delete thread" code. Here is a snippet of the stack trace:

          [5] sigacthandler(), at 0xdfb8e1e0
          ---- called from signal handler with signal 11 (SIGSEGV) ------
          [6] jni_GetEnv(0xdfa7f4dc, 0xd245fbb4, 0x10002), at 0xdf862752
          [7] profhandler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdd790b5c
          [8] __sighndlr(0x1d, 0xd245fc54, 0xd245fcd4, 0xdd790ab0), at 0xdfb92963
          [9] call_user_handler(0x1d, 0xd245fc54, 0xd245fcd4), at 0xdfb8e0b6
          [10] take_deferred_signal(0x1d), at 0xdfb8e181
          [11] exit_critical(), at 0xdfb89f14
          [12] preempt(), at 0xdfb8f391
          [13] _ti_mutex_unlock(0xdfb5a338), at 0xdfb900b6
          [14] free(0x84b87a0), at 0xdfaf6c80
          [15] os::free(0x84b87a0), at 0xdf7e33c8
          [16] CHeapObj::operator delete(0x84b87a0), at 0xdf80ea19
          [17] __SLIP.DELETER__D(0x84b87a0, 0x1), at 0xdf84e47d
          [18] Thread::delete_thread_and_TLS_current(0x84b87a0), at 0xdf84e42d
          [19] JavaThread::thread_main_inner(0x84b87a0), at 0xdf7f9d3a

        The complete stack trace is attached as thread.log.137.

        The crashes have shown up in VolanoTest running on 12*900MHZ machines.
        Vtest server crashed after about 8 hours. complete stack trace is
        attached as vtest.log.18602 .
        stack trace :
        >>>>> [13] t_delete(0x1, 0x30, 0x28, 0x270390, 0x352ff8, 0x0), at
        >>>>> 0xff2c388c
        >>>>> [14] realfree(0x352fd0, 0xff3426f8, 0xff3426f8, 0x3, 0x3, 0x0), at
        >>>>> 0xff2c3520
        >>>>> [15] _free_unlocked(0x3394e8, 0x0, 0x0, 0xff33c000, 0x0, 0x0), at
        >>>>> 0xff2c3d0c
        >>>>> [16] free(0x3394e8, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff2c3c64
        >>>>> [17] ThreadSafepointState::destroy(0x39db70, 0x45, 0xff17a000,
        >>>>> 0xfee75f58,
        >>>>> 0x321c38, 0x1), at 0xfee773c0
        >>>>> [18] __SLIP.DELETER__D(0x39db70, 0x1, 0x1106e0, 0x34d820, 0x30f928,
        >>>>> 0xfee6a5c8), at 0xfee77448
        >>>>> [19] JavaThread::run(0x39db70, 0x942c2, 0x40, 0x0, 0x40, 0x0), at
        >>>>> 0xfee6a684
        >>>>> [20] _start(0x39db70, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfee66b04


        ###@###.### 2003-05-09

        Attachments

          Issue Links

            Activity

              People

                dcubed Daniel Daugherty
                dcubed Daniel Daugherty
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: