Background:
The fix forJDK-8160543 "C1: Crash in java.lang.String.indexOf in some java.sql tests" looked like it was causing a performance regression in navierStokes benchmark. Consecutive runs of the benchmark showed very high variance between results (~100% speedup). It showed a bimodal behaviour where fast runs were as fast, and slow as slow, but the proportion of slow increased.
Drilling down to the cause of this was hard, any intrusive debug flag (like PrintAssembly) made the benchmark end up in the fast case.
Problem description:
The problem is that sometimes invokeSpecial calls aren't inlined because no type profile data has been collected for that callsite even though it is hot.
This method is compiled on level 3 (full profile):
java.lang.invoke.LambdaForm$DMH/2008276237::invokeSpecial_LID_V (18 bytes)
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation
@ 14 java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (0 bytes) MemberName not constant
@14 is a native method: compiled as
java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (native) (static)
Sometimes, in the fast case (not always!), there are already some method data collected for invokeSpecial_LID_V and the profile code will be generated slightly differently.
When invokeSpecial_LID_V is later compiled on level 4 (full opt) this method data is available:
-----------------
static java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V
interpreter_invocation_count: 5472
invocation_counter: 5474
backedge_counter: 0
mdo size: 640 bytes
parameter types 0: stack(0) 'java/lang/invoke/DirectMethodHandle$Special'
1: stack(1) 'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
0 fast_aload_0
1 invokestatic 18 <java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;>
0 bci: 1 CounterData count(5336)
argument types 0: stack(0) 'java/lang/invoke/DirectMethodHandle$Special'
return type 'java/lang/invoke/MemberName'
4 astore #5
6 aload_1
7 iload_2
8 dload_3
9 aload #5
11 checkcast 20 <java/lang/invoke/MemberName>
48 bci: 11 ReceiverTypeData count(0) nonprofiled_count(0) entries(0)
14 invokestatic 26 <java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V>
104 bci: 14 CounterData count(0)
argument types 0: stack(0) none
1: stack(4) none
-----------------
No data at bci 14 makes the inliner do:
[.. snippet from huge inline log...]
@ 55 java.lang.invoke.LambdaForm$DMH/221378065::invokeSpecial_LID_V (18 bytes) force inline by annotation
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation
@ 14 jdk.nashorn.internal.runtime.arrays.NumberArrayData::setElem (24 bytes) call site not reached
..
@14 is a linkToSpecial that was redirected to a invokeSpecial, that is linked to NumberArrayData::setElem that is missing a profile.
We can note that the perf data is still there as the method parameter profile have caught the class.
Why does this only happens sometimes?
linkToSpecial is a shared form. It may be compiled into a signature specialized method - but it is native and is missing type profile counters. We will get no data when it is running. The level 3 compiled invokeSpecial_LID_V sometimes only live a very short time ~50ms, and the callstack of the benchmark is pretty deep. The amount of invocations varies between 0 and ±500, often being very low, or quite high. It looks like the call is used in some phases, and depending on the timing it may not have been run.
Solution:
Don't forbid methodhandle call sites missing type profile data to be inlined, the surrounding invokeSpecial probably will - leave it to normal inlining heuristics to decide if it is hot enough.
Lessons:
Type profiling of methodHandle and Lambda form code may need to be revisited. We both seem to profile to little and too much. Native methods get no profile, but in other places we are profiling the same reference at every use causing a lot of overhead.
Testing:
The fix proposed fix makes us end up in the fast case every run. It should remove a lot of variance from Nashorn benchmarks.
The fix for
Drilling down to the cause of this was hard, any intrusive debug flag (like PrintAssembly) made the benchmark end up in the fast case.
Problem description:
The problem is that sometimes invokeSpecial calls aren't inlined because no type profile data has been collected for that callsite even though it is hot.
This method is compiled on level 3 (full profile):
java.lang.invoke.LambdaForm$DMH/2008276237::invokeSpecial_LID_V (18 bytes)
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation
@ 14 java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (0 bytes) MemberName not constant
@14 is a native method: compiled as
java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (native) (static)
Sometimes, in the fast case (not always!), there are already some method data collected for invokeSpecial_LID_V and the profile code will be generated slightly differently.
When invokeSpecial_LID_V is later compiled on level 4 (full opt) this method data is available:
-----------------
static java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V
interpreter_invocation_count: 5472
invocation_counter: 5474
backedge_counter: 0
mdo size: 640 bytes
parameter types 0: stack(0) 'java/lang/invoke/DirectMethodHandle$Special'
1: stack(1) 'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
0 fast_aload_0
1 invokestatic 18 <java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;>
0 bci: 1 CounterData count(5336)
argument types 0: stack(0) 'java/lang/invoke/DirectMethodHandle$Special'
return type 'java/lang/invoke/MemberName'
4 astore #5
6 aload_1
7 iload_2
8 dload_3
9 aload #5
11 checkcast 20 <java/lang/invoke/MemberName>
48 bci: 11 ReceiverTypeData count(0) nonprofiled_count(0) entries(0)
14 invokestatic 26 <java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V>
104 bci: 14 CounterData count(0)
argument types 0: stack(0) none
1: stack(4) none
-----------------
No data at bci 14 makes the inliner do:
[.. snippet from huge inline log...]
@ 55 java.lang.invoke.LambdaForm$DMH/221378065::invokeSpecial_LID_V (18 bytes) force inline by annotation
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation
@ 14 jdk.nashorn.internal.runtime.arrays.NumberArrayData::setElem (24 bytes) call site not reached
..
@14 is a linkToSpecial that was redirected to a invokeSpecial, that is linked to NumberArrayData::setElem that is missing a profile.
We can note that the perf data is still there as the method parameter profile have caught the class.
Why does this only happens sometimes?
linkToSpecial is a shared form. It may be compiled into a signature specialized method - but it is native and is missing type profile counters. We will get no data when it is running. The level 3 compiled invokeSpecial_LID_V sometimes only live a very short time ~50ms, and the callstack of the benchmark is pretty deep. The amount of invocations varies between 0 and ±500, often being very low, or quite high. It looks like the call is used in some phases, and depending on the timing it may not have been run.
Solution:
Don't forbid methodhandle call sites missing type profile data to be inlined, the surrounding invokeSpecial probably will - leave it to normal inlining heuristics to decide if it is hot enough.
Lessons:
Type profiling of methodHandle and Lambda form code may need to be revisited. We both seem to profile to little and too much. Native methods get no profile, but in other places we are profiling the same reference at every use causing a lot of overhead.
Testing:
The fix proposed fix makes us end up in the fast case every run. It should remove a lot of variance from Nashorn benchmarks.