-
Enhancement
-
Resolution: Unresolved
-
P4
-
26
First investigation into benchmarks done here:
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783 / JDK-8365290.
It seems to me that people are making decisions about fill and copy intrinsics on benchmarks that are noisy and don't properly control for alignment - that can give us misleading results.
It turns out that we barely have any fill and copy benchmarks that really test automatic alignment.
We should also compare to auto-vectorization performance.
We should test Array.fill, System.arraycopy, but also some MemorySegment bulk operations. Then also compare to naive loops, both with intrinsics enabled and disabled: -XX:-OptimizeFill
Also look at JDK-8299808, and the discussion there.
We could take a similar approach as inJDK-8355094 with:
test/micro/org/openjdk/bench/vm/compiler/VectorAutoAlignment.java
We should also go through the benchmarks mentioned in
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783
and see if they still behave as the comments in them suggest:
- alignment assumptions
- performance assumptions / comparison with SuperWord, especially afterJDK-8324751.
This is also a really good way to better understand the performance of auto-vectorization (SuperWord) on small iteration counts. This is where the intrinsics are currently much better than auto-vectorization. See also JDK-8344085. But it is possible that auto-vectorization is actually faster with large iteration counts.
For MemorySegment, we already have:
- ./test/micro/org/openjdk/bench/java/lang/foreign/BulkOps.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkFill.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkCopy.java
We also should make sure to check fill for zero separately, some platforms are much faster when they zero out memory.
We should also check the impact of Lilliput / CompactObjectHeaders, as those change the alignment of some element types.
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783 / JDK-8365290.
It seems to me that people are making decisions about fill and copy intrinsics on benchmarks that are noisy and don't properly control for alignment - that can give us misleading results.
It turns out that we barely have any fill and copy benchmarks that really test automatic alignment.
We should also compare to auto-vectorization performance.
We should test Array.fill, System.arraycopy, but also some MemorySegment bulk operations. Then also compare to naive loops, both with intrinsics enabled and disabled: -XX:-OptimizeFill
Also look at JDK-8299808, and the discussion there.
We could take a similar approach as in
test/micro/org/openjdk/bench/vm/compiler/VectorAutoAlignment.java
We should also go through the benchmarks mentioned in
https://github.com/openjdk/jdk/pull/26747#issuecomment-3269114783
and see if they still behave as the comments in them suggest:
- alignment assumptions
- performance assumptions / comparison with SuperWord, especially after
This is also a really good way to better understand the performance of auto-vectorization (SuperWord) on small iteration counts. This is where the intrinsics are currently much better than auto-vectorization. See also JDK-8344085. But it is possible that auto-vectorization is actually faster with large iteration counts.
For MemorySegment, we already have:
- ./test/micro/org/openjdk/bench/java/lang/foreign/BulkOps.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkFill.java
- ./test/micro/org/openjdk/bench/java/lang/foreign/SegmentBulkCopy.java
We also should make sure to check fill for zero separately, some platforms are much faster when they zero out memory.
We should also check the impact of Lilliput / CompactObjectHeaders, as those change the alignment of some element types.
- relates to
-
JDK-8344085 C2 SuperWord: improve vectorization for small loop iteration count
-
- Open
-
-
JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
-
- Open
-
-
JDK-8299808 C2 SuperWord: investigate performance difference to ArrayFill
-
- Open
-