Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6419370

new jmethodID code has tiny holes in synchronization

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • hs17
    • 6
    • hotspot
    • None
    • b83
    • 6
    • b02
    • generic
    • generic

        We (Robert, Dan and Serguei) after a email discussion agreed that there are
        some holes in syncronization of the following new jmethodID code:

        src/share/vm/oops/instanceKlass.cpp:

        913 // Lookup or create a jmethodID
        914 jmethodID instanceKlass::jmethod_id_for_impl(instanceKlassHandle ik_h, methodHandle method_h) {
        915 size_t idnum = (size_t)method_h->method_idnum();
        916 jmethodID* jmeths = ik_h->methods_jmethod_ids_acquire();
        917 size_t length = 0; // first assignment of length is debugging crumb
        918 jmethodID id = NULL;
        919 // array length stored in first element, other elements offset by one
        920 if (jmeths == NULL || // If there is no jmethodID array,
        921 (length = (size_t)jmeths[0]) <= idnum || // or if it is too short,
        922 (id = jmeths[idnum+1]) == NULL) { // or if this jmethodID isn't allocated
        923 MutexLocker ml(JNIIdentifier_lock); // then do double check locking
        924 // Retry lookup after we got the lock
        925 jmeths = ik_h->methods_jmethod_ids_acquire();
        926 if (jmeths == NULL || (length = (size_t)jmeths[0]) <= idnum) {
        927 size_t size = MAX2(idnum+1, (size_t)ik_h->idnum_allocated_count());
        928 jmethodID* new_jmeths = NEW_C_HEAP_ARRAY(jmethodID, size+1);
        929 memset(new_jmeths, 0, (size+1)*sizeof(jmethodID));
        930 if (jmeths != NULL) {
        931 // We have grown the array: copy the existing entries, and delete the old array
        932 for (size_t index = 0; index < length; index++) {
        933 new_jmeths[index+1] = jmeths[index+1];
        934 }
        935 FreeHeap(jmeths);
        936 }
        937 new_jmeths[0] =(jmethodID)size;
        938 ik_h->release_set_methods_jmethod_ids(jmeths = new_jmeths);
        939 } else {
        940 id = jmeths[idnum+1];
        941 }
        942 if (id == NULL) {
        943 if (method_h->is_old() && !method_h->is_obsolete()) {
        944 // The method passed in is old (but not obsolete), we need to use the current version
        945 methodOop current_method = ik_h->method_with_idnum(idnum);
        946 assert(current_method != NULL, "old and but not obsolete, so should exist");
        947 methodHandle current_method_h(current_method == NULL? method_h() : current_method);
        948 id = JNIHandles::make_jmethod_id(current_method_h);
        949 } else {
        950 // It is the current version of the method or an obsolete method,
        951 // use the version passed in
        952 id = JNIHandles::make_jmethod_id(method_h);
        953 }
        954 jmeths[idnum+1] = id;
        955 }
        956 }
        957 return id;
        958 }

        961 // Lookup a jmethodID, NULL if not found. Do no blocking, no allocations, no handles
        962 jmethodID instanceKlass::jmethod_id_or_null(methodOop method) {
        963 size_t idnum = (size_t)method->method_idnum();
        964 jmethodID* jmeths = methods_jmethod_ids_acquire();
        965 size_t length; // length assigned as debugging crumb
        966 jmethodID id = NULL;
        967 if (jmeths != NULL && // If there is a jmethodID array,
        968 (length = (size_t)jmeths[0]) > idnum) { // and if it is long enough,
        969 id = jmeths[idnum+1]; // Look up the id (may be NULL)
        970 }
        971 return id;
        972 }

        975 // Cache an itable index
        976 void instanceKlass::set_cached_itable_index(size_t idnum, int index) {
        977 int* indices = methods_cached_itable_indices_acquire();
        978 if (indices == NULL || // If there is no index array,
        979 ((size_t)indices[0]) <= idnum) { // or if it is too short
        980 // Lock before we allocate the array so we don't leak
        981 MutexLocker ml(JNIIdentifier_lock);
        982 // Retry lookup after we got the lock
        983 indices = methods_cached_itable_indices_acquire();
        984 size_t length = 0;
        985 // array length stored in first element, other elements offset by one
        986 if (indices == NULL || (length = (size_t)indices[0]) <= idnum) {
        987 size_t size = MAX2(idnum+1, (size_t)idnum_allocated_count());
        988 int* new_indices = NEW_C_HEAP_ARRAY(int, size+1);
        989 // Copy the existing entries, if any
        990 size_t i;
        991 for (i = 0; i < length; i++) {
        992 new_indices[i+1] = indices[i+1];
        993 }
        994 // Set all the rest to -1
        995 for (i = length; i < size; i++) {
        996 new_indices[i+1] = -1;
        997 }
        998 if (indices != NULL) {
        999 FreeHeap(indices); // delete any old indices
        1000 }
        1001 release_set_methods_cached_itable_indices(indices = new_indices);
        1002 }
        1003 }
        1004 // This is a cache, if there is a race to set it, it doesn't matter
        1005 indices[idnum+1] = index;
        1006 }

        1009 // Retrieve a cached itable index
        1010 int instanceKlass::cached_itable_index(size_t idnum) {
        1011 int* indices = methods_cached_itable_indices_acquire();
        1012 if (indices != NULL && ((size_t)indices[0]) > idnum) {
        1013 // indices exist and are long enough, retrieve possible cached
        1014 return indices[idnum+1];
        1015 }
        1016 return -1;
        1017 }

        Please, see some fragments of our email discussion below.

        -------- Original Message --------
        Subject: Re: [Fwd: Re: OK FOLKS. I really need reviewers. TODAY!
        PLEASE! [Method add/delete on redefine]]
        Date: Wed, 26 Apr 2006 17:00:01 -0700
        Wrom: MNNSKVFVWRKJVZCMHVIBGDADRZFSQHYUCDDJBLVLMH
        To: Serguei Spitsyn <###@###.###>, Serviceability Group
        <###@###.###>
        References: <###@###.###>


        Serguei Spitsyn wrote:

        > Hi Robert,
        >
        > You have not replied on this email.
        > I just want to be sure you remember about this.

        Rats! I had let this fall through the cracks, thanks for re-sending.

        > Integrating your fixes is urgent and more important,
        > so I'm Ok if you reply or resolve the sync issues later.

        Good. That is the approach we have decided to take. See below.

        > Let me know if you disagree with my observation.

        I don't.

        >
        > Thanks,
        > Serguei
        >
        > -------- Original Message --------
        > Subject: Re: OK FOLKS. I really need reviewers. TODAY! PLEASE! [Method add/delete on redefine]
        > Date: Tue, 25 Apr 2006 23:35:33 -0700
        > Wrom: AALPTCXLYRWTQTIPWIGYOKSTTZRCLBDXRQBGJSNBOHMKHJY
        > To: Robert Field <###@###.###>
        >
        > Robert Field wrote:
        >> Serguei Spitsyn wrote:
        >>
        >>> Hi Robert,
        > ...
        >>> 5. Setting the index must be protected by locking the JNIIdentifier_lock.
        >>
        >> It doesn't need to be protected, I have added this comment:
        >> // This is a cache, if there is a race to set it, it doesn't matter
        >> The whole point of having it is the speed of JNI static invokes, so
        >> we don't want to lock unless it is critical to do so.
        >
        > Sure, performance is important.
        > But let's consder this:
        > Thread T1 executes the line (deallocates current indeces):
        > 987 FreeHeap(indices); // delete any old indices
        > After that thread T2 allocates new heap array with NEW_C_HEAP_ARRAY(),
        > gets the same range of addresses and fill in new values.
        > Thread T3 concurrently with T1 and T2 executes the line:
        > 992 indices[idnum+1] = index;
        >
        > There is still a small probability that T3 sets the new index into
        > newly allocated heap array. There can be a crash or even something
        > worse.
        > Such errors are very hard to debug.
        > Please, let me know if you think such a scenario is not possible.
        >
        >>> src/share/vm/oops/instanceKlass.cpp:
        >>> 992 indices[idnum+1] = index;
        > ...
        >
        >>> 7. Depending on the usage we may want to protect this function with the
        >>> JNIIdentifier_lock as well. (Need to check if it is really needed).
        >>>
        >> See above
        >
        >
        > I'm afraid a similar scenario is possible as considered above.
        > The T1 and T2 threads do the same as above:
        > - T1 deallocates the 'indeces' array
        > - T2 allocates new heap array, gets the same range of addresses
        > and set new values into it
        > - T3 (cached_itable_index) returns wrong value from the
        > indices[idnum+1].

        There is indeed a possible crash here and same in the jmethodID code.
        Fortunately, it would be very very rare: it would require (I'll use
        jmethodID)--

            * A jmethodID is created in class C [array is created]
            * Then C is redefined with an added or obsoleted method
              [idnum_allocated_count increased]
            * T1 wants a jmethodID, and between the time it retrieves the
              jmethodID array and it dereferences it (a few simple instructions) ...
            * T2 comes along and wants a jmethodID for one of the new or
              obsolete methods (idnum larger than before) [jmethodID grows]
            * Somebody scribbles on the released array
            * Then T1 resumes and uses the scribbled on array - boom

        If you can write a test that makes that happen, I'll buy you a ticket to Hawaii!

        The right way to fix this would be to have redefine always grow the
        array (if it exists) while at a safepoint. Doing this would require a
        full run of tests.

        We decided that compared to the known and reported jmethodID crash: get
        a jmethodID, unload its class, use the jmethodID -- boom
        that this fixes (as a side-effect) that it was OK to submit now. And
        fix this later.

        ------------------------------------------------------------------------------------
        This is an observation from Dan:

        Greetings,

        Okay, we have really "smart" double check locking going on and
        that has opened a potential race or has it? Before we go tweaking
        the code, I want to make sure we understand the problem from all
        the different angles.

        The core problem pointed out by Serguei's diligent code review is:

        - _methods_jmethod_ids refers to an array of jmethodIDs or NULL
        - _methods_cached_itable_indices refers to an array of ints or NULL
        - because of our double check locking algorithm, Thread1 can be
          accessing one of the above arrays while Thread2 has just freed
          it with FreeHeap

        Each of the two arrays goes through two types of transitions:

        - from NULL to non-NULL
        - from non-NULL to bigger non-NULL

        The transition from NULL to non-NULL is safe for multi-threaded use
        when the idnum for indexing the array is within the initial expected
        size of the array (initial method count):

        - the initial thread down the path sees the NULL, grabs the
          JNIIdentifier_lock and starts the array allocation procedure
        - if a competing thread sees the NULL value, it heads down the
          JNIIdentifier_lock path also and blocks
        - the newly allocated arrays are not advertised via the
          release_set_*() calls until after the array length element @0
          and the array values have been initialized.
        - if a competing thread sees a non-NULL value:
          - for _methods_jmethod_ids array
            - if the requested idnum is out of range (it won't be for this
              case) or if the jmethodID value for requested idnum is NULL,
              then the competing thread heads down the JNIIdentifier_lock path
            - otherwise, the fetched jmethodID value is returned
          - for_methods_cached_itable_indices
            - if the requested idnum is out of range (it won't be for this
              case), then the competing thread heads down the
              JNIIdentifier_lock path
            - otherwise, the specified index value is cached

        The above transition is safe because competing threads either see
        stable data that they can use or they head down the locking path
        when there is more work to do.

        The transition from NULL to non-NULL when one of the competing
        threads is looking for an idnum slot outside the initial expected
        size of the array (initial method count) is the same as a non-NULL
        to bigger non-NULL transition.

        Finally let's look at non-NULL to bigger non-NULL transitions.
        Thread1 in this case is looking for an idnum slot that is past the
        current size of the array so it heads down the JNIIdentifier_lock
        path. Thread2 in this case is looking for an idnum slot that is
        within the current size of the array. Here is a transaction diagram:

          Thread1 Thread2
          =========================================== ===================
          allocates new array sees non-NULL array
          copies old array contents to new array :
          frees old array :
          uses release_set_*() to advertise new array fetches element 0

        Just the fetch of the length element (array element 0) is a problem.
        After the FreeHeap() call, the memory could be:

        - still there with all the old element 0 value
        - reused for something else so a bogus element 0 value

        For _methods_jmethod_ids array, we are going to fetch a potentially
        bogus jmethodID value.
        For _methods_cached_itable_indices array, we are going to store
        in memory that isn't ours anymore.

        Looks bad and it is bad. There is only one possible saving grace.
        If the call that causes the array to grow comes from code that
        is called only at a safepoint, then I don't think there can be
        other competing threads.

        jmethodID_for_impl() is only called by methodOopDesc::jmethod_id().
        set_cached_itable_index() is only called by
        methodOopDesc::set_cached_itable_index(). I don't think we can
        guarantee that jmethod_id() and set_cached_itable_index() are only
        called from a safepoint. So the existing code indeed has this
        wonderful race. For those of you that were already convinced, sorry
        I took you on a long hike...

        We can't guarantee all the calls happen safely, but what if the first
        jmethod_id() call and set_cached_itable_index() call happens in
        RedefineClasses() by the doit() code path, then things are safe.

        On further thought, these "initializing" calls only need to be made if
        the instanceKlass already has a non-NULL _methods_jmethod_ids array
        and/or _methods_cached_itable_indices array. If they don't exist, then
        we are in the NULL to non-NULL transition case and things are safe.

        This is all kind of stream of consciousness so I have no idea whether
        there are hole here also or not.

        Comments, questions, corrections?

        Dan

              dcubed Daniel Daugherty
              sspitsyn Serguei Spitsyn
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: