-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
21.0.1
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
Here is a simple JMH benchmark:
```
import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@Fork(value = 1 /*, jvmArgsAppend = {"-XX:+UnlockDiagnosticVMOptions", "-Xlog:class+load=info", "-XX:+LogCompilation", "-XX:+PrintAssembly", "-XX:+DebugNonSafepoints"}*/)
@Warmup(iterations = 2, time = 6)
@Measurement(iterations = 2, time = 10)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class Bench {
@State(Scope.Benchmark)
public static class BenchState {
public static final VarHandle AGE_DIRECT_VAR_HANDLE;
public static final IntVarHandle AGE_INDIRECT_VAR_HANDLE;
static {
try {
AGE_DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Person.class, MethodHandles.lookup())
.findVarHandle(Person.class, "age", int.class);
AGE_INDIRECT_VAR_HANDLE = new IntVarHandle(AGE_DIRECT_VAR_HANDLE);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
Person[] data;
@Setup
public void setup() {
var random = ThreadLocalRandom.current();
data = Stream.generate(Person::new)
.limit(4096)
.peek(p -> p.age = random.nextInt())
.toArray(Person[]::new);
}
}
@Benchmark
public long benchmark_direct_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += (int) BenchState.AGE_DIRECT_VAR_HANDLE.get(state.data[i]);
}
return acc;
}
@Benchmark
public long benchmark_indirect_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += BenchState.AGE_INDIRECT_VAR_HANDLE.getInt(state.data[i]);
}
return acc;
}
public final static class Person {
public int age;
}
public abstract static class TypedVarHandle {
}
public static class IntVarHandle extends TypedVarHandle {
private final VarHandle varHandle;
public IntVarHandle(VarHandle varHandle) {
this.varHandle = varHandle;
}
public int getInt(Person o) {
return (int) varHandle.get(o);
}
}
}
```
The results:
```
Bench.benchmark_direct_var_handle thrpt 2 653482.980 ops/s
Bench.benchmark_indirect_var_handle thrpt 2 36149.181 ops/s
```
From what I saw so far the reason is that the FieldInstanceReadOnly::get is not inlined in the indirect version.
Here are inlining logs for both cases
direct:
```
[6,254s][debug][jit,compilation] 121 % pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,254s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_direct_var_handle (40 bytes) force inline by CompileCommand
[6,254s][debug][jit,inlining ] @ 26 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 32 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get (22 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 8 java.lang.Class::cast (27 bytes) (intrinsic)
[6,254s][debug][jit,inlining ] @ 11 java.util.Objects::requireNonNull (14 bytes) inline (hot)
[6,254s][debug][jit,inlining ] @ 18 jdk.internal.misc.Unsafe::getInt (0 bytes) (intrinsic)
[6,254s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
[6,254s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
[6,255s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
```
indirect:
```
[6,278s][debug][jit,compilation] 122 % pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,278s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_indirect_var_handle (40 bytes) force inline by CompileCommand
[6,279s][debug][jit,inlining ] @ 26 pl.ks.Bench$IntVarHandle::getInt (9 bytes) inline (hot)
[6,279s][debug][jit,inlining ] @ 5 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
[6,279s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
[6,279s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
[6,279s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
[6,279s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
```
The perfasm profiler also suggest that there is not a single compilation in indirect case. Hottest methods for direct:
```
....[Hottest Methods (after inlining)]..............................................................
98,42% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub, version 4, compile id 551
```
For indirect:
```
....[Hottest Regions]...............................................................................
44,31% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub, version 4, compile id 532
41,97% c2, level 4 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get, version 2, compile id 499
```
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run JMH benchamark
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The indirect version shouldn't be so slow.
FREQUENCY : always
Here is a simple JMH benchmark:
```
import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@Fork(value = 1 /*, jvmArgsAppend = {"-XX:+UnlockDiagnosticVMOptions", "-Xlog:class+load=info", "-XX:+LogCompilation", "-XX:+PrintAssembly", "-XX:+DebugNonSafepoints"}*/)
@Warmup(iterations = 2, time = 6)
@Measurement(iterations = 2, time = 10)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class Bench {
@State(Scope.Benchmark)
public static class BenchState {
public static final VarHandle AGE_DIRECT_VAR_HANDLE;
public static final IntVarHandle AGE_INDIRECT_VAR_HANDLE;
static {
try {
AGE_DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Person.class, MethodHandles.lookup())
.findVarHandle(Person.class, "age", int.class);
AGE_INDIRECT_VAR_HANDLE = new IntVarHandle(AGE_DIRECT_VAR_HANDLE);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
Person[] data;
@Setup
public void setup() {
var random = ThreadLocalRandom.current();
data = Stream.generate(Person::new)
.limit(4096)
.peek(p -> p.age = random.nextInt())
.toArray(Person[]::new);
}
}
@Benchmark
public long benchmark_direct_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += (int) BenchState.AGE_DIRECT_VAR_HANDLE.get(state.data[i]);
}
return acc;
}
@Benchmark
public long benchmark_indirect_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += BenchState.AGE_INDIRECT_VAR_HANDLE.getInt(state.data[i]);
}
return acc;
}
public final static class Person {
public int age;
}
public abstract static class TypedVarHandle {
}
public static class IntVarHandle extends TypedVarHandle {
private final VarHandle varHandle;
public IntVarHandle(VarHandle varHandle) {
this.varHandle = varHandle;
}
public int getInt(Person o) {
return (int) varHandle.get(o);
}
}
}
```
The results:
```
Bench.benchmark_direct_var_handle thrpt 2 653482.980 ops/s
Bench.benchmark_indirect_var_handle thrpt 2 36149.181 ops/s
```
From what I saw so far the reason is that the FieldInstanceReadOnly::get is not inlined in the indirect version.
Here are inlining logs for both cases
direct:
```
[6,254s][debug][jit,compilation] 121 % pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,254s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_direct_var_handle (40 bytes) force inline by CompileCommand
[6,254s][debug][jit,inlining ] @ 26 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 32 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get (22 bytes) force inline by annotation
[6,254s][debug][jit,inlining ] @ 8 java.lang.Class::cast (27 bytes) (intrinsic)
[6,254s][debug][jit,inlining ] @ 11 java.util.Objects::requireNonNull (14 bytes) inline (hot)
[6,254s][debug][jit,inlining ] @ 18 jdk.internal.misc.Unsafe::getInt (0 bytes) (intrinsic)
[6,254s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
[6,254s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
[6,255s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
```
indirect:
```
[6,278s][debug][jit,compilation] 122 % pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub @ 13 (59 bytes)
[6,278s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_indirect_var_handle (40 bytes) force inline by CompileCommand
[6,279s][debug][jit,inlining ] @ 26 pl.ks.Bench$IntVarHandle::getInt (9 bytes) inline (hot)
[6,279s][debug][jit,inlining ] @ 5 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
[6,279s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
[6,279s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
[6,279s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
[6,279s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
```
The perfasm profiler also suggest that there is not a single compilation in indirect case. Hottest methods for direct:
```
....[Hottest Methods (after inlining)]..............................................................
98,42% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub, version 4, compile id 551
```
For indirect:
```
....[Hottest Regions]...............................................................................
44,31% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub, version 4, compile id 532
41,97% c2, level 4 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get, version 2, compile id 499
```
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run JMH benchamark
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The indirect version shouldn't be so slow.
FREQUENCY : always
- duplicates
-
JDK-8233873 final field values should be trusted as constant
- Open