Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8257837

Performance regression in heap byte buffer views

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 16
    • None
    • core-libs
    • None
    • b00

      This benchmark:

      import org.openjdk.jmh.annotations.Benchmark;
      import org.openjdk.jmh.annotations.BenchmarkMode;
      import org.openjdk.jmh.annotations.Fork;
      import org.openjdk.jmh.annotations.Measurement;
      import org.openjdk.jmh.annotations.Mode;
      import org.openjdk.jmh.annotations.OutputTimeUnit;
      import org.openjdk.jmh.annotations.Setup;
      import org.openjdk.jmh.annotations.State;
      import org.openjdk.jmh.annotations.TearDown;
      import org.openjdk.jmh.annotations.Warmup;
      import sun.misc.Unsafe;

      import java.lang.invoke.VarHandle;
      import java.nio.ByteBuffer;
      import java.nio.ByteOrder;
      import java.nio.FloatBuffer;
      import java.util.concurrent.TimeUnit;

      import static jdk.incubator.foreign.MemoryLayout.PathElement.sequenceElement;
      import static jdk.incubator.foreign.MemoryLayouts.JAVA_FLOAT;
      import static jdk.incubator.foreign.MemoryLayouts.JAVA_INT;

      @BenchmarkMode(Mode.AverageTime)
      @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
      @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
      @State(org.openjdk.jmh.annotations.Scope.Thread)
      @OutputTimeUnit(TimeUnit.MILLISECONDS)
      @Fork(value = 3, jvmArgsAppend = { "--add-modules=jdk.incubator.foreign" })
      public class LoopOverPolluted {

          static final int ELEM_SIZE = 1_000_000;
          static final int CARRIER_SIZE = (int) JAVA_INT.byteSize();
          static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

          static final Unsafe unsafe = Utils.unsafe;

          ByteBuffer bb = ByteBuffer.allocateDirect(ALLOC_SIZE).order(ByteOrder.nativeOrder());
          byte[] arr = new byte[ALLOC_SIZE];
          FloatBuffer fb = ByteBuffer.wrap(arr).order(ByteOrder.nativeOrder()).asFloatBuffer();

          @Setup
          public void setup() {
              for (int i = 0; i < ELEM_SIZE; i++) {
                  bb.putFloat(i * 4, i);
              }
              for (int i = 0; i < ELEM_SIZE; i++) {
                  fb.put(i, i);
              }
          }

          @TearDown
          public void tearDown() {
              unsafe.invokeCleaner(bb);
              arr = null;
              fb = null;
          }

          @Benchmark
          public int byte_buffer_get_float() {
              int sum = 0;
              for (int k = 0; k < ELEM_SIZE; k++) {
                  bb.putFloat(k, (float)k + 1);
                  float v = bb.getFloat(k * 4);
                  sum += (int)v;
              }
              return sum;
          }

          @Benchmark
          public int float_buffer_get() {
              int sum = 0;
              for (int k = 0; k < ELEM_SIZE; k ++) {
                  fb.put(k, k + 1);
                  float v = fb.get(k);
                  sum += (int)v;
              }
              return sum;
          }

          @Benchmark
          public int unsafe_get_float() {
              int sum = 0;
              for (int k = 0; k < ALLOC_SIZE; k += 4) {
                  unsafe.putFloat(arr, k + Unsafe.ARRAY_BYTE_BASE_OFFSET, k + 1);
                  float v = unsafe.getFloat(arr, k + Unsafe.ARRAY_BYTE_BASE_OFFSET);
                  sum += (int)v;
              }
              return sum;
          }
      }



      Reveals a performance regression between Java 15 and Java 16. Here are the results on Java 15:

      Benchmark Mode Cnt Score Error Units
      LoopOverPolluted.byte_buffer_get_float avgt 30 0.802 ? 0.011 ms/op
      LoopOverPolluted.float_buffer_get avgt 30 0.789 ? 0.009 ms/op
      LoopOverPolluted.unsafe_get_float avgt 30 0.494 ? 0.006 ms/op


      On Java 16 we get this:

      Benchmark Mode Cnt Score Error Units
      LoopOverPolluted.byte_buffer_get_float avgt 30 0.590 ? 0.012 ms/op
      LoopOverPolluted.float_buffer_get avgt 30 2.432 ? 0.060 ms/op
      LoopOverPolluted.unsafe_get_float avgt 30 0.504 ? 0.008 ms/op


      This is likely caused by profile pollution in ScopedMemoryAccess - which is now used by the ByteBuffer API to access memory (at least in the heap views).

            mcimadamore Maurizio Cimadamore
            mcimadamore Maurizio Cimadamore
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: