(1) Below loop with stride = 1 can be auto-vectorized by the C2 SLP.
private static void bar(int start, int limit) {
for (int i = start; i < limit; i += 1) {
c[i] = a[i] + b[i];
}
}
(2) If we manually unroll it once, like below. C2 SLP fails to vectorize it.
private static void bar(int start, int limit) {
for (int i = start; i < limit; i += 2) {
c[i] = a[i] + b[i];
c[i + 1] = a[i + 1] + b[i + 1];
}
}
(3) But if we change the loop iv initial value to a compile-time constant, it can be vectorized again.
private static void bar(int start, int limit) {
for (int i = 10; i < limit; i += 2) {
c[i] = a[i] + b[i];
c[i + 1] = a[i + 1] + b[i + 1];
}
}
We should try to enable the auto-vectorization for above case (2).
private static void bar(int start, int limit) {
for (int i = start; i < limit; i += 1) {
c[i] = a[i] + b[i];
}
}
(2) If we manually unroll it once, like below. C2 SLP fails to vectorize it.
private static void bar(int start, int limit) {
for (int i = start; i < limit; i += 2) {
c[i] = a[i] + b[i];
c[i + 1] = a[i + 1] + b[i + 1];
}
}
(3) But if we change the loop iv initial value to a compile-time constant, it can be vectorized again.
private static void bar(int start, int limit) {
for (int i = 10; i < limit; i += 2) {
c[i] = a[i] + b[i];
c[i + 1] = a[i + 1] + b[i + 1];
}
}
We should try to enable the auto-vectorization for above case (2).