-
Enhancement
-
Resolution: Unresolved
-
P5
-
22
-
x86_64
- On legacy Xeons (Skylake, Cascade Lake) we saw significant performance degradation with AVX512 instruction due to frequency level switchover penalty, reduced core frequency and hysteresis effect which causes subsequent non-AVX512 instruction stream to execute at a lower frequency.
- Problem became severe in workloads where we see spikes of AVX512 instructions in otherwise AVX2 instruction trace.
- Some of the X86 stubs (array copy / fills) do take AVX3Thresholds into consideration but not all.
There are multiple approaches to address this problem.
1) Multi versioning of loops guarded by AVX3Threshold, but this may result into code bloating given that we already split the iteration space into
pre-main-post (atomic vector)-vector_tail- scalar tail.
2) We can use loop profiling information collected by C1 and interpreter to dynamically adjust per loop VectorSize.
This pass can also take other factors like instruction costs and target feature availability to dynamically adjust per loop vector size, may address some of the issues discovered on KNL (https://bugs.openjdk.org/browse/JDK-8309267)
FTR, from X86 standpoint it is not a pressing issue since latest Xeons are past frequency problems with AVX512.
- Problem became severe in workloads where we see spikes of AVX512 instructions in otherwise AVX2 instruction trace.
- Some of the X86 stubs (array copy / fills) do take AVX3Thresholds into consideration but not all.
There are multiple approaches to address this problem.
1) Multi versioning of loops guarded by AVX3Threshold, but this may result into code bloating given that we already split the iteration space into
pre-main-post (atomic vector)-vector_tail- scalar tail.
2) We can use loop profiling information collected by C1 and interpreter to dynamically adjust per loop VectorSize.
This pass can also take other factors like instruction costs and target feature availability to dynamically adjust per loop vector size, may address some of the issues discovered on KNL (https://bugs.openjdk.org/browse/JDK-8309267)
FTR, from X86 standpoint it is not a pressing issue since latest Xeons are past frequency problems with AVX512.
- relates to
-
JDK-8287697 Limit auto vectorization to 32-byte vector on Cascade Lake
-
- Resolved
-