Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8367158

C2: create better fill and copy benchmarks, taking alignment into account

XMLWordPrintable

      First investigation into benchmarks done here:
      https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783 / JDK-8365290.

      It seems to me that people are making decisions about fill and copy intrinsics on benchmarks that are noisy and don't properly control for alignment - that can give us misleading results.

      It turns out that we barely have any fill and copy benchmarks that really test automatic alignment.

      We should also compare to auto-vectorization performance.

      We should test Array.fill, System.arraycopy, but also some MemorySegment bulk operations. Then also compare to naive loops, both with intrinsics enabled and disabled: -XX:-OptimizeFill

      Also look at JDK-8299808, and the discussion there.

      We could take a similar approach as in JDK-8355094 with:
      test/micro/org/openjdk/bench/vm/compiler/VectorAutoAlignment.java

      We should also go through the benchmarks mentioned in
      https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783
      and see if they still behave as the comments in them suggest:
      - alignment assumptions
      - performance assumptions / comparison with SuperWord, especially after JDK-8324751.

      This is also a really good way to better understand the performance of auto-vectorization (SuperWord) on small iteration counts. This is where the intrinsics are currently much better than auto-vectorization. See also JDK-8344085. But it is possible that auto-vectorization is actually faster with large iteration counts.

      For MemorySegment, we already have:
      - ./test/micro/org/openjdk/bench/java/lang/foreign/BulkOps.java
      - ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkFill.java
      - ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkCopy.java

      We also should make sure to check fill for zero separately, some platforms are much faster when they zero out memory.

      We should also check the impact of Lilliput / CompactObjectHeaders, as those change the alignment of some element types.

            epeter Emanuel Peter
            epeter Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: