Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8198543

C2: Wrong type of return value from Unsafe.getAndSetObject() call

XMLWordPrintable

    • generic
    • generic

      Information from JDK-8198531:

      Sometimes C2 eliminates non-null branch in the following code as if it proved that ThreadCont.getAndSet() can't return non-null:

          val cont = ThreadCont.getAndSet(threads[i], null)
          if (cont != null) { /* do something */ }

      Detailed problem description & reproducer (by Roman Elizarov):
        https://github.com/ktorio/ktor/blob/resumeAnyThread_HotSpotBug/BUG_README_FIRST.md



      FULL PRODUCT VERSION :
      java version "1.8.0_161"
      Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
      Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

      FULL OS VERSION :
      Darwin unit-940.labs.intellij.net 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov 9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Not relevant. Reproduces on various Linux versions, too

      A DESCRIPTION OF THE PROBLEM :
      We have a code that use AtomicReferenceFieldUpdater.getAndSet. The way it is used can be summaries like this (in Kotlin/JVM code):

      val cont = ThreadCont.getAndSet(threads[i], null)
      if (cont != null) { /* do something */ }

      Now, under some circumstances HotSpot produces the following assembly for the above 'getAndSet':

        0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
                                                      ; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
                                                      ; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)

        0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]

      As you can see the result of getAndSet (xchg instruction) is immediately lost (overwritten) instead of checking it for null.



      THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

      THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Clone this code branch from github: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug

      git clone https://github.com/ktorio/ktor.git -b resumeAnyThread_HotSpotBug

      2. Build the corresponding test classes:

      ./gradlew :ktor-client:ktor-client-cio:compileTestKotlin

      3. Run the script that runs the test with all the appropriate JVM options (dumps assembly, etc)

      ./run_test.sh

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      Expected behavior: Test should pass (it takes up to 45 seconds)
      Actual behavior: Test fails (hangs for 1 minute and more)
      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      HotSpot does not crash, but miscompiles method 'resumeAnyThread' '(Lkotlinx/coroutines/experimental/internal/LockFreeLinkedListNode;)V' in 'io/ktor/network/util/IOCoroutineDispatcher'

      This is the relevant part of the run_test.txt file (that is also committed to the branch). The miscompiled version can be found on line 444115 of run_test.txt file:

        0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
                                                      ; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
                                                      ; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)

        0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      We've failed to minimize the problem as it is fleeting. But it reproduces in just a stable way on this particular test of this particular application: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      We've found that many changes in the code make the bug go away. The simplest workaround is to introduce a variable in the source code, e.g. replace this code:

      val cont = ThreadCont.getAndSet(threads[i], null) // BAD

      with this one:

      val t = threads[i] ; val cont = ThreadCont.getAndSet(t, null) // GOOD

      See here: https://github.com/ktorio/ktor/blob/41aa9a71c33b9c6fb1de7f50d63df5d3f029f4d1/ktor-network/src/io/ktor/network/util/IOCoroutineDispatcher.kt#L76

      Also using array for threads (instead of ArrayList) fixes the problem, extracting method, etc. Other simplifications to the code fix it too. For example, removing logging from this method fixes it as long as the method itself is not inlined with compiler oracle: -XX:CompileCommand=dontinline,*.resumeAnyThread

      However, even the version w

            vlivanov Vladimir Ivanov
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: