Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8237077

C2 fails to optimize certain code shapes with memory access indexed var handles

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Not an Issue
    • P3
    • tbd
    • 15
    • hotspot

    Description

      Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:

      static final int ELEM_SIZE = 1_000_000;
      static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
      static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

      static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());

      @Benchmark
          public void segment_loop() {
              try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
                  for (int i = 0; i < ELEM_SIZE; i++) {
                      MemoryAddress address = segment.baseAddress();
                      if (i % 2 == 0) {
                          VH_int.set(address, (long)i, i + 1);
                      } else {
                          VH_int.set(address, (long)i, i - 1);
                      }
                  }
              }
          }

      This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:

      @Benchmark
          public void segment_loop() {
              try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
                  for (int i = 0; i < ELEM_SIZE; i++) {
                      if (i % 2 == 0) {
                          VH_int.set(segment.baseAddress(), (long)i, i + 1);
                      } else {
                          VH_int.set(segment.baseAddress(), (long)i, i - 1);
                      }
                  }
              }
          }
                      
      The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.

      Attachments

        Issue Links

          Activity

            People

              vlivanov Vladimir Ivanov
              mcimadamore Maurizio Cimadamore
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: