-
Enhancement
-
Resolution: Unresolved
-
P4
-
24
------------------ Background -------------------
We already have reduction:
int sum = 0;
for (i = 0; i < a.length; i++) {
sum += a[i];
}
Tests with working IR verification for that can be found here:
test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java
test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java
test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java
test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java
test/hotspot/jtreg/compiler/loopopts/superword/TestVectorFPReduction.java
(there are more, just grep for "REDUCTION" in test/hotspot/jtreg/compiler)
The loop is unrolled and vectorized by the SuperWord algorithm.
src/hotspot/share/opto/superword.hpp
Vectorized reduction is introduced, and corresponding code is emitted in the backend, e.g. see:
grep Reduction src/hotspot/cpu/x86/ -r
-------------- Prefix Sum --------------
Pattern like:
int sum = 0;
for (i = 0; i < a.length; i++) {
sum += a[i];
b[i] = sum;
}
or
int sum = 0;
for (i = 0; i < a.length; i++) {
b[i] = sum;
sum += a[i];
}
As far as I know there are no direct assembly instructions for vectorized prefix-sum, so we have to create our own. Just like with the existing reduction: it is a combination of shuffles and element-wise add.
For inspiration, look at the graphics here: https://en.wikipedia.org/wiki/Prefix_sum
The prefix-sum is important for a lot of parallel algorithms, so this would be very exciting work.
We already have reduction:
int sum = 0;
for (i = 0; i < a.length; i++) {
sum += a[i];
}
Tests with working IR verification for that can be found here:
test/hotspot/jtreg/compiler/vectorapi/TestVectorAddMulReduction.java
test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java
test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java
test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java
test/hotspot/jtreg/compiler/loopopts/superword/TestVectorFPReduction.java
(there are more, just grep for "REDUCTION" in test/hotspot/jtreg/compiler)
The loop is unrolled and vectorized by the SuperWord algorithm.
src/hotspot/share/opto/superword.hpp
Vectorized reduction is introduced, and corresponding code is emitted in the backend, e.g. see:
grep Reduction src/hotspot/cpu/x86/ -r
-------------- Prefix Sum --------------
Pattern like:
int sum = 0;
for (i = 0; i < a.length; i++) {
sum += a[i];
b[i] = sum;
}
or
int sum = 0;
for (i = 0; i < a.length; i++) {
b[i] = sum;
sum += a[i];
}
As far as I know there are no direct assembly instructions for vectorized prefix-sum, so we have to create our own. Just like with the existing reduction: it is a combination of shuffles and element-wise add.
For inspiration, look at the graphics here: https://en.wikipedia.org/wiki/Prefix_sum
The prefix-sum is important for a lot of parallel algorithms, so this would be very exciting work.
- relates to
-
JDK-8339348 Vector scan operation
- Open
-
JDK-8012647 Add Arrays.parallelPrefix (prefix sum, scan, cumulative sum)
- Closed