-
Bug
-
Resolution: Duplicate
-
P2
-
None
-
8u161
-
generic
-
generic
Information from JDK-8198531:
Sometimes C2 eliminates non-null branch in the following code as if it proved that ThreadCont.getAndSet() can't return non-null:
val cont = ThreadCont.getAndSet(threads[i], null)
if (cont != null) { /* do something */ }
Detailed problem description & reproducer (by Roman Elizarov):
https://github.com/ktorio/ktor/blob/resumeAnyThread_HotSpotBug/BUG_README_FIRST.md
FULL PRODUCT VERSION :
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
FULL OS VERSION :
Darwin unit-940.labs.intellij.net 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov 9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64
EXTRA RELEVANT SYSTEM CONFIGURATION :
Not relevant. Reproduces on various Linux versions, too
A DESCRIPTION OF THE PROBLEM :
We have a code that use AtomicReferenceFieldUpdater.getAndSet. The way it is used can be summaries like this (in Kotlin/JVM code):
val cont = ThreadCont.getAndSet(threads[i], null)
if (cont != null) { /* do something */ }
Now, under some circumstances HotSpot produces the following assembly for the above 'getAndSet':
0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)
0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]
As you can see the result of getAndSet (xchg instruction) is immediately lost (overwritten) instead of checking it for null.
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Clone this code branch from github: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
git clone https://github.com/ktorio/ktor.git -b resumeAnyThread_HotSpotBug
2. Build the corresponding test classes:
./gradlew :ktor-client:ktor-client-cio:compileTestKotlin
3. Run the script that runs the test with all the appropriate JVM options (dumps assembly, etc)
./run_test.sh
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected behavior: Test should pass (it takes up to 45 seconds)
Actual behavior: Test fails (hangs for 1 minute and more)
ERROR MESSAGES/STACK TRACES THAT OCCUR :
HotSpot does not crash, but miscompiles method 'resumeAnyThread' '(Lkotlinx/coroutines/experimental/internal/LockFreeLinkedListNode;)V' in 'io/ktor/network/util/IOCoroutineDispatcher'
This is the relevant part of the run_test.txt file (that is also committed to the branch). The miscompiled version can be found on line 444115 of run_test.txt file:
0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)
0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
We've failed to minimize the problem as it is fleeting. But it reproduces in just a stable way on this particular test of this particular application: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
We've found that many changes in the code make the bug go away. The simplest workaround is to introduce a variable in the source code, e.g. replace this code:
val cont = ThreadCont.getAndSet(threads[i], null) // BAD
with this one:
val t = threads[i] ; val cont = ThreadCont.getAndSet(t, null) // GOOD
See here: https://github.com/ktorio/ktor/blob/41aa9a71c33b9c6fb1de7f50d63df5d3f029f4d1/ktor-network/src/io/ktor/network/util/IOCoroutineDispatcher.kt#L76
Also using array for threads (instead of ArrayList) fixes the problem, extracting method, etc. Other simplifications to the code fix it too. For example, removing logging from this method fixes it as long as the method itself is not inlined with compiler oracle: -XX:CompileCommand=dontinline,*.resumeAnyThread
However, even the version w
Sometimes C2 eliminates non-null branch in the following code as if it proved that ThreadCont.getAndSet() can't return non-null:
val cont = ThreadCont.getAndSet(threads[i], null)
if (cont != null) { /* do something */ }
Detailed problem description & reproducer (by Roman Elizarov):
https://github.com/ktorio/ktor/blob/resumeAnyThread_HotSpotBug/BUG_README_FIRST.md
FULL PRODUCT VERSION :
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
FULL OS VERSION :
Darwin unit-940.labs.intellij.net 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov 9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64
EXTRA RELEVANT SYSTEM CONFIGURATION :
Not relevant. Reproduces on various Linux versions, too
A DESCRIPTION OF THE PROBLEM :
We have a code that use AtomicReferenceFieldUpdater.getAndSet. The way it is used can be summaries like this (in Kotlin/JVM code):
val cont = ThreadCont.getAndSet(threads[i], null)
if (cont != null) { /* do something */ }
Now, under some circumstances HotSpot produces the following assembly for the above 'getAndSet':
0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)
0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]
As you can see the result of getAndSet (xchg instruction) is immediately lost (overwritten) instead of checking it for null.
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Clone this code branch from github: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
git clone https://github.com/ktorio/ktor.git -b resumeAnyThread_HotSpotBug
2. Build the corresponding test classes:
./gradlew :ktor-client:ktor-client-cio:compileTestKotlin
3. Run the script that runs the test with all the appropriate JVM options (dumps assembly, etc)
./run_test.sh
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected behavior: Test should pass (it takes up to 45 seconds)
Actual behavior: Test fails (hangs for 1 minute and more)
ERROR MESSAGES/STACK TRACES THAT OCCUR :
HotSpot does not crash, but miscompiles method 'resumeAnyThread' '(Lkotlinx/coroutines/experimental/internal/LockFreeLinkedListNode;)V' in 'io/ktor/network/util/IOCoroutineDispatcher'
This is the relevant part of the run_test.txt file (that is also committed to the branch). The miscompiled version can be found on line 444115 of run_test.txt file:
0x00000001110d42b2: xchg QWORD PTR [rcx+0x1b8],r8 ;*invokevirtual getAndSetObject
; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)
0x00000001110d42b9: mov r8,QWORD PTR [rsp+0x28]
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
We've failed to minimize the problem as it is fleeting. But it reproduces in just a stable way on this particular test of this particular application: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
We've found that many changes in the code make the bug go away. The simplest workaround is to introduce a variable in the source code, e.g. replace this code:
val cont = ThreadCont.getAndSet(threads[i], null) // BAD
with this one:
val t = threads[i] ; val cont = ThreadCont.getAndSet(t, null) // GOOD
See here: https://github.com/ktorio/ktor/blob/41aa9a71c33b9c6fb1de7f50d63df5d3f029f4d1/ktor-network/src/io/ktor/network/util/IOCoroutineDispatcher.kt#L76
Also using array for threads (instead of ArrayList) fixes the problem, extracting method, etc. Other simplifications to the code fix it too. For example, removing logging from this method fixes it as long as the method itself is not inlined with compiler oracle: -XX:CompileCommand=dontinline,*.resumeAnyThread
However, even the version w
- duplicates
-
JDK-8162540 Crash in C2 escape analysis with assert: "node should be registered"
- Closed
-
JDK-8198531 C2: Wrong type of return value of Unsafe.getAndSetObject() call
- Closed