Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8318893

Bad performance of VarHandles

XMLWordPrintable

      A DESCRIPTION OF THE PROBLEM :
      Here is a simple JMH benchmark:

      ```
      import java.lang.invoke.MethodHandles;
      import java.lang.invoke.VarHandle;
      import java.util.concurrent.ThreadLocalRandom;
      import java.util.concurrent.TimeUnit;
      import java.util.stream.Stream;
      import org.openjdk.jmh.annotations.Benchmark;
      import org.openjdk.jmh.annotations.BenchmarkMode;
      import org.openjdk.jmh.annotations.Fork;
      import org.openjdk.jmh.annotations.Measurement;
      import org.openjdk.jmh.annotations.Mode;
      import org.openjdk.jmh.annotations.OutputTimeUnit;
      import org.openjdk.jmh.annotations.Scope;
      import org.openjdk.jmh.annotations.Setup;
      import org.openjdk.jmh.annotations.State;
      import org.openjdk.jmh.annotations.Warmup;
      @Fork(value = 1 /*, jvmArgsAppend = {"-XX:+UnlockDiagnosticVMOptions", "-Xlog:class+load=info", "-XX:+LogCompilation", "-XX:+PrintAssembly", "-XX:+DebugNonSafepoints"}*/)
      @Warmup(iterations = 2, time = 6)
      @Measurement(iterations = 2, time = 10)
      @BenchmarkMode(Mode.Throughput)
      @OutputTimeUnit(TimeUnit.SECONDS)
      public class Bench {
          @State(Scope.Benchmark)
          public static class BenchState {
              public static final VarHandle AGE_DIRECT_VAR_HANDLE;
              public static final IntVarHandle AGE_INDIRECT_VAR_HANDLE;
              static {
                  try {
                      AGE_DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Person.class, MethodHandles.lookup())
                          .findVarHandle(Person.class, "age", int.class);
                      AGE_INDIRECT_VAR_HANDLE = new IntVarHandle(AGE_DIRECT_VAR_HANDLE);
                  } catch (Throwable e) {
                      throw new RuntimeException(e);
                  }
              }
              Person[] data;
              @Setup
              public void setup() {
                  var random = ThreadLocalRandom.current();
                  data = Stream.generate(Person::new)
                      .limit(4096)
                      .peek(p -> p.age = random.nextInt())
                      .toArray(Person[]::new);
              }
          }
          @Benchmark
          public long benchmark_direct_var_handle(BenchState state) {
              long acc = 0;
              for (int i = 0; i < state.data.length; i++) {
                  acc += (int) BenchState.AGE_DIRECT_VAR_HANDLE.get(state.data[i]);
              }
              return acc;
          }
          @Benchmark
          public long benchmark_indirect_var_handle(BenchState state) {
              long acc = 0;
              for (int i = 0; i < state.data.length; i++) {
                  acc += BenchState.AGE_INDIRECT_VAR_HANDLE.getInt(state.data[i]);
              }
              return acc;
          }
          public final static class Person {
              public int age;
          }
          public abstract static class TypedVarHandle {
          }
          public static class IntVarHandle extends TypedVarHandle {
              private final VarHandle varHandle;
              public IntVarHandle(VarHandle varHandle) {
                  this.varHandle = varHandle;
              }
              public int getInt(Person o) {
                  return (int) varHandle.get(o);
              }
          }
      }
      ```

      The results:
      ```
      Bench.benchmark_direct_var_handle thrpt 2 653482.980 ops/s
      Bench.benchmark_indirect_var_handle thrpt 2 36149.181 ops/s
      ```

      From what I saw so far the reason is that the FieldInstanceReadOnly::get is not inlined in the indirect version.

      Here are inlining logs for both cases
      direct:
      ```
      [6,254s][debug][jit,compilation] 121 % pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub @ 13 (59 bytes)
      [6,254s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_direct_var_handle (40 bytes) force inline by CompileCommand
      [6,254s][debug][jit,inlining ] @ 26 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
      [6,254s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
      [6,254s][debug][jit,inlining ] @ 32 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get (22 bytes) force inline by annotation
      [6,254s][debug][jit,inlining ] @ 8 java.lang.Class::cast (27 bytes) (intrinsic)
      [6,254s][debug][jit,inlining ] @ 11 java.util.Objects::requireNonNull (14 bytes) inline (hot)
      [6,254s][debug][jit,inlining ] @ 18 jdk.internal.misc.Unsafe::getInt (0 bytes) (intrinsic)
      [6,254s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
      [6,254s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
      [6,255s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
      ```

      indirect:
      ```
      [6,278s][debug][jit,compilation] 122 % pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub @ 13 (59 bytes)
      [6,278s][debug][jit,inlining ] @ 19 pl.ks.Bench::benchmark_indirect_var_handle (40 bytes) force inline by CompileCommand
      [6,279s][debug][jit,inlining ] @ 26 pl.ks.Bench$IntVarHandle::getInt (9 bytes) inline (hot)
      [6,279s][debug][jit,inlining ] @ 5 java.lang.invoke.VarHandleGuards::guard_L_I (59 bytes) force inline by annotation
      [6,279s][debug][jit,inlining ] @ 29 java.lang.invoke.VarForm::getMemberName (21 bytes) force inline by annotation
      [6,279s][debug][jit,inlining ] @ 22 org.openjdk.jmh.infra.Blackhole::consume (19 bytes) force inline by CompileCommand
      [6,279s][debug][jit,inlining ] @ 15 org.openjdk.jmh.infra.Blackhole::consumeFull (31 bytes) disallowed by CompileCommand
      [6,279s][debug][jit,inlining ] @ 39 java.lang.System::nanoTime (0 bytes) (intrinsic)
      ```

      The perfasm profiler also suggest that there is not a single compilation in indirect case. Hottest methods for direct:
      ```
      ....[Hottest Methods (after inlining)]..............................................................
        98,42% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_direct_var_handle_jmhTest::benchmark_direct_var_handle_thrpt_jmhStub, version 4, compile id 551
      ```

      For indirect:
      ```
      ....[Hottest Regions]...............................................................................
        44,31% c2, level 4 pl.ks.jmh_generated.Bench_benchmark_indirect_var_handle_jmhTest::benchmark_indirect_var_handle_thrpt_jmhStub, version 4, compile id 532
        41,97% c2, level 4 java.lang.invoke.VarHandleInts$FieldInstanceReadOnly::get, version 2, compile id 499
      ```

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run JMH benchamark

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The indirect version shouldn't be so slow.

      FREQUENCY : always


            mchung Mandy Chung
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: