Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4049821

Synchronization problem in code which converts bytecode to quick variant

XMLWordPrintable

    • 1.2alpha2
    • sparc
    • solaris_2.5.1
    • Not verified

      Enclosed find a description of a bug that we have found in the code that
      changes a bytecode to its quick variant. This is a subtle bug that is
      hard to find and may cause unexpected behavior.

      Through reading the implementation of the JVM, I and Alain Azagury, a
      colleague here at Haifa Research Lab, believe that we have
      found a bug in the way that the JVM changes the getfield and putfield
      bytecodes to their quick variants. (We also suspect a similar bug in
      the conversion of invocation bytecodes to their quick variants, but we
      have not yet had time to check it.) The bug is a subtle bug in
      synchronization. In particular, the code synchronizes all threads
      that try to update the bytecode to the quick variant, but it does not
      synchronize the threads that execute (e.g., read) the bytecode.

      The bug is due to hidden assumptions made by the programmers of the
      JVM that the order in which a thread performs assignment statements
      and the order in which those assignments are viewed as occurring by
      another thread are the same. Given today's optimizing compilers and
      weak consistency models used in multi-processor architectures, this
      assumption may not be true.

      The nature of this bug makes it very hard to find, hard to repeat, and
      potentially hard to know that it occurred; yet, it could allow a
      program to read or write the wrong place in memory.

      Detailed Description
      --------------------
      (For the sake of simplicity we will describe the bug for getfield and
      getfield-quick; the same bug occurs for putfield and putfield-quick.)

      The original getfield bytecode consists of an opcode byte for getfield
      followed by two index bytes, which together serve as an index into the
      constant table. The getfield_quick bytecode consists of an opcode
      byte for getfield_quick followed by a single offset byte; the third
      byte (e.g., the former second index byte) is ignored when
      getfield_quick is executed.

      To convert a getfield bytecode to its quick variant, the JVM uses the
      index to access the constant table and resolve the constant it finds
      there to an offset. This is the offset that is put back into the code
      when the getfield opcode is replaced by the getfield_quick.

      When the JVM encounters a getfield bytecode in the Java instruction
      stream, it always converts it to its quick variant and then it
      executes the quick variant. Subsequent executions of the instruction
      will also execute the quick variant.

      Execution flow for the getfield bytecode including conversion process:
      ---------------------------------------------------------------------

      1. The JVM reads the getfield opcode and dispatches to the
      appropriate code.

      2. The JVM obtains the CODE_LOCK lock.

      3. The JVM rereads the opcode from the instruction stream and also
      reads the index bytes.

      4. The JVM releases CODE_LOCK.

      5. The JVM checks that the reread opcode is the same as the original
      opcode. If not, it goes back and re-executes the instruction from the
      beginning.

      6. The JVM uses the index bytes into the constant pool in order to
      find the class and the offset (In Java terminology this is called
      constant resolution).

      7. The JVM re-obtains the CODE_LOCK lock.

      8. It rechecks the opcode in the instruction stream again. If the
      opcode has changed, it releases the lock and goes back and re-executes
      the instruction from the beginning.

      9. The JVM replaces the first index byte in the instruction stream by
      the offset.

      10. The JVM replaces the getfield opcode by the getfield_quick opcode.

      11. The JVM releases the lock.

      12. The JVM executes the quick opcode.

      Execution flow for the getfield_quick opcode:
      ---------------------------------------------

      1. The JVM reads the getfield_quick opcode and dispatches to the
      appropriate code.

      2. The JVM reads the offset parameter from the instruction stream.

      3. The JVM uses the offset to read the appropriate field from the
      object.

      Notice that execution of the getfield_quick opcode does not obtain any
      lock.


      Possible interleavings of threads:
      ----------------------------------

      If one thread is in the process of converting a getfield bytecode to a
      getfield_quick and another thread executes the same bytecode
      instruction simultaneously, there can be a race condition that is not
      totally closed by the CODE_LOCK mechanism. There are three scenarios:

      1. The second thread sees the old getfield. In this case, it also
      enters the flow for getfield and attempts to obtain CODE_LOCK. When
      the CODE_LOCK is granted, it will discover that the bytecode has
      changed; thus, it will release the lock and re-execute the instruction
      as a getfield_quick. Thus, there is no problem.

      2. The second thread sees both the new getfield_quick and the new
      offset byte. In this case, there is no problem.

      3. The second thread sees the new getfield_quick and the old index
      byte. This can happen if a compiler reorders steps 9 and 10 of the
      getfield execution flow; it can also happen in a multi-processor,
      which employs weakly consistent storage access ordering, such as the
      PowerPC and the PentiumPro.


      A compiler can reorder code
      ---------------------------
      We checked with the compiler experts in Haifa and verified that modern
      optimizing compilers can change the order of store instructions. They
      can do this when they do not detect any dependency between the two
      instructions. The code for the getfield execution flow is written in
      a way such that the compiler cannot detect a dependency between step
      9, which stores the offset into the bytecode stream, and step 10, which
      stores the getfield_quick.

      In this case there is no good reason for a compiler to change the
      order of the store instructions. However, there is also no indication
      from the programmer to indicate that the order should not be changed.

      Appropriate use of the C "volatile" modifier in the variable
      declarations could force the compiler not to change the ordering.

      A multi-processor can execute load and stores out of order
      ----------------------------------------------------------
      We checked with the hardware and VLSI experts in Haifa and verified
      that the PowerPC uses weakly consistent storage access ordering. Here
      is an example to explain this concept:

         processor 1 processor 2
         ----------- -----------
         std r1,A ld r3,B
         std r2,B ld r4,A

      Assume that the old value of A is a0 and the new value is a1;
      similarly, for B, b0 and b1. According to weak consistency, processor
      2 might see any of the following <r3, r4> pairs: <b0, a0>, <b0, a1>,
      <b1, a0>, <b1, a1>.

      If we compare the above scenario to the getfield execution, the two
      stores of process 1 correspond to steps 9 and 10. The two loads of
      process 2 correspond to steps 1 and 2 of the getfield_quick execution.
      Thus, a thread executing a getfield_quick opcode can see the new
      getfield_quick opcode and the old index belonging to the original
      getfield instruction. This can occur even if the compiler does not
      reorder the store instructions.

      This race condition can be closed by inserting PowerPC "execution
      synchronizing" instructions, e.g., sync, in appropriate places in the
      code.


      The bug occurs on other architectures including Sun's Sparc architecture.

      1) The bug exists on two levels: (1) the compiler level for both
      uniprocessors and multiprocessors and (2) the architecture level for
      multiprocessors.

      Compiler level:

      If an optimizing compiler can tell that two store statements always
      write to two distinct locations of memory, then the compiler can reorder
      those statements. This is true for any computing platform, i.e., any
      combination of hardware and operating system. So the bug exists at the
      compiler level for all platforms on which the JVM runs.

      Architecture level:

      An architecture for a multiprocessor specifies a memory coherence model.
      For performance reasons, most modern multiprocessor architectures
      specify weak coherence models. In a weak model, if one processor makes
      consecutive stores to two locations, other processors are not guaranteed
      to see those stores in the same order they were made by the first
      processor.

      The Sparc, PowerPC, and Alpha all specify weak memory coherence models. So
      the synchronization bug could occur on all three of these architectures and
      probably on others.

            never Tom Rodriguez
            rschiavisunw Richard Schiavi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: