-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
Affects Version/s: 27
-
Component/s: hotspot
This is another approach to solve JDK-8344085.
I did some benchmarking recently, see comments:
https://github.com/openjdk/jdk/pull/22629#issuecomment-3811234712
I can see that small iteration counts probably do not profit from automatic alignment. There is a trade-off here:
Alignment means spending more iterations in the scalar pre-loop, and it can happen that more pre-loop iterations means we do fewer iterations in main/drain loop. That has a performance penalty, especially noticable for small iteration loops where a few single-iterations make a big contribution to runtime.
Misalignment means we have split memory accesses in the main/drain loop. That has a penalty that could cut the speedups of vectorization in half, just as the vectors are split into two.
If we only have few main/drain loop iterations this is not so
noticable, but if there are many main/drain loop iterations,
this really starts to show.
There must be a cut-off point:
- below we should not align, the extra pre-loop iterations are too expensive
- above we should align, split accesses in the main-loop are too expensive
I did some benchmarking recently, see comments:
https://github.com/openjdk/jdk/pull/22629#issuecomment-3811234712
I can see that small iteration counts probably do not profit from automatic alignment. There is a trade-off here:
Alignment means spending more iterations in the scalar pre-loop, and it can happen that more pre-loop iterations means we do fewer iterations in main/drain loop. That has a performance penalty, especially noticable for small iteration loops where a few single-iterations make a big contribution to runtime.
Misalignment means we have split memory accesses in the main/drain loop. That has a penalty that could cut the speedups of vectorization in half, just as the vectors are split into two.
If we only have few main/drain loop iterations this is not so
noticable, but if there are many main/drain loop iterations,
this really starts to show.
There must be a cut-off point:
- below we should not align, the extra pre-loop iterations are too expensive
- above we should align, split accesses in the main-loop are too expensive
- relates to
-
JDK-8344085 C2 SuperWord: improve vectorization for small loop iteration count
-
- Open
-