-
Enhancement
-
Resolution: Fixed
-
P3
-
16, 17
-
b26
Disclaimer: while this investigation was motivated by Project Panama, I can reproduce the findings on a vanilla JDK.
There seems to be an issue when calling a close() method in a try/catch or try/finally block. In other words, this idiom:
Resource r = createResource()
doSomething(r);
r.close();
Is significantly faster than this other idiom:
Resource r = createResource()
try {
doSomething(r);
r.close();
} finally {
r.close();
}
or this idiom:
Resource r = createResource()
try {
doSomething(r);
r.close();
} catch (Throwable t) {
r.close();
throw t;
}
The attached benchmark can be used to reproduce these findings:
Benchmark Mode Cnt Score Error Units
ResourceScopeCloseMin.confined_close avgt 30 14.398 ? 0.175 ns/op
ResourceScopeCloseMin.confined_close_notry avgt 30 6.375 ? 0.053 ns/op
I've tried to remove all factors which could lead to potential confusion in JIT, such as interfaces, or javac-generated try-with-resources blocks (which contain extra calls to add suppressed exceptions which might have skewed the benchmark).
If I run with -XX:+PrintInlining I always see something like this:
@ 25 org.openjdk.bench.jdk.incubator.foreign.ResourceScopeCloseMin$ConfinedScope::close (48 bytes) already compiled into a medium method
And, inspecting the bytecode reveals that @25 is indeed the BCI of the close() call inside the catch block. Commenting out that call makes everything fast again.
---
The issue is essentially that the second call to close() in the finally/catch block is never reached, so C2 doesn't inline the call, which makes the Resource object escape, and not scalarizable. This leads then to the difference in performance. C2 normally replaces infrequently taken branches with an uncommon trap, but doesn't use an uncommon trap for the finally/catch blocks, since there is currently no information available to indicate whether finally/catch blocks are taken or not. They are entered through exception dispatch rather than a regular branch through the 'goto' bytecode (which does have profiling implemented for it).
To proposed solution is to implement profiling for exception handlers (which is the underlying bytecode implementation of the finally/catch blocks), and then use that profiling information in C2 to emit uncommon traps in place of exception handlers that are never entered. In the examples above, this would eliminate the second call to close() altogether, meaning the resource object also does not escape and can be scalar replaced. This brings the performance on par with the base case.
There seems to be an issue when calling a close() method in a try/catch or try/finally block. In other words, this idiom:
Resource r = createResource()
doSomething(r);
r.close();
Is significantly faster than this other idiom:
Resource r = createResource()
try {
doSomething(r);
r.close();
} finally {
r.close();
}
or this idiom:
Resource r = createResource()
try {
doSomething(r);
r.close();
} catch (Throwable t) {
r.close();
throw t;
}
The attached benchmark can be used to reproduce these findings:
Benchmark Mode Cnt Score Error Units
ResourceScopeCloseMin.confined_close avgt 30 14.398 ? 0.175 ns/op
ResourceScopeCloseMin.confined_close_notry avgt 30 6.375 ? 0.053 ns/op
I've tried to remove all factors which could lead to potential confusion in JIT, such as interfaces, or javac-generated try-with-resources blocks (which contain extra calls to add suppressed exceptions which might have skewed the benchmark).
If I run with -XX:+PrintInlining I always see something like this:
@ 25 org.openjdk.bench.jdk.incubator.foreign.ResourceScopeCloseMin$ConfinedScope::close (48 bytes) already compiled into a medium method
And, inspecting the bytecode reveals that @25 is indeed the BCI of the close() call inside the catch block. Commenting out that call makes everything fast again.
---
The issue is essentially that the second call to close() in the finally/catch block is never reached, so C2 doesn't inline the call, which makes the Resource object escape, and not scalarizable. This leads then to the difference in performance. C2 normally replaces infrequently taken branches with an uncommon trap, but doesn't use an uncommon trap for the finally/catch blocks, since there is currently no information available to indicate whether finally/catch blocks are taken or not. They are entered through exception dispatch rather than a regular branch through the 'goto' bytecode (which does have profiling implemented for it).
To proposed solution is to implement profiling for exception handlers (which is the underlying bytecode implementation of the finally/catch blocks), and then use that profiling information in C2 to emit uncommon traps in place of exception handlers that are never entered. In the examples above, this would eliminate the second call to close() altogether, meaning the resource object also does not escape and can be scalar replaced. This brings the performance on par with the base case.
- relates to
-
JDK-8321141 VM build issue on MacOS after JDK-8267532
-
- Resolved
-
-
JDK-8323101 C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if
-
- Resolved
-
-
JDK-8320310 CompiledMethod::has_monitors flag can be incorrect
-
- Resolved
-
-
JDK-8323651 compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot
-
- Resolved
-
-
JDK-8310011 Arena with try-with-resources is slower than it should be
-
- Closed
-
(2 links to)