-
Enhancement
-
Resolution: Duplicate
-
P4
-
17, 20, 21
Found this because I was looking into JDK-8278920
I have some examples that:
AVX2: vectorize
KNL: do NOT vectorize -> why does it not just make use of its AVX2 capabilities?
AVX512: vectorize
I have not tested this on a proper KNL machine, but when I whitelist the KNL setting in the IR-Framework (see JDK-8309183), then some IR-tests begin to fail. And I think that is a clear indicator that the tests would fail on actual KNL machines.
--------------------------------------
For example, I extracted compiler.loopopts.superword.TestGeneralizedReductions.testMapReductionOnGlobalAccumulator:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:+UseKNLSetting Test1.java
Unimplemented
497 PopCountL === _ 498 [[ 496 ]] Type:int !orig=418,356,126 !jvms: Test1::test @ bci:18 (line 14)
The pack has 8 ops.
But if I run:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:UseAVX=3 Test1.java
Then the pack with 8 PopCountL seems to create no issues, we vectorize.
Likewise, with:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:UseAVX=2 Test1.java
I get a pack of 4 PopCountL, and that vectorizes.
The problem is in src/hotspot/cpu/x86/x86.ad
case Op_PopCountVI:
case Op_PopCountVL: {
if (!is_pop_count_instr_target(bt) &&
(size_in_bits == 512) && !VM_Version::supports_avx512bw()) {
return false;
}
}
The issue is that we have allowed the packing of 8 longs, which makes 512 bit. But under KNL we have no avx512bw support. So we say "unimplemented" and reject the packing, and end up with no vectorization. But it would have been nice to instead step down to AVX2, and pack 4 longs. Because that should be able to vectorize!
One solution: we could retry vectorization at a smaller MaxVectorSize if it fails. I have seen some other compilers do that.
Another option: find the maximal vector width per instruction at the beginning of SuperWord, and limit the vectorization to the smallest one we find.
--------- List of failures below, may not be exhausive ---------------
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "private static long compiler.loopopts.superword.TestGeneralizedReductions.testMapReductionOnGlobalAccumulator(long[])" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={}, applyIf={}, applyIfCPUFeature={"avx2", "true"}, counts={"_#ADD_REDUCTION_VI#_", ">= 1", "_#POPCOUNT_VL#_", ">= 1"}, failOn={}, applyIfAnd={"SuperWordReductions", "true", "UsePopCountInstruction", "true"}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(AddReductionVI.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 2: "(\\d+(\\s){2}(PopCountVL.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
Reason:
case Op_PopCountVI:
case Op_PopCountVL: {
if (!is_pop_count_instr_target(bt) &&
(size_in_bits == 512) && !VM_Version::supports_avx512bw()) {
return false;
}
}
---------------------------------
I see similar failures like this, with KNL:
case Op_PopulateIndex:
if (size_in_bits > 256 && !VM_Version::supports_avx512bw()) {
return false;
}
break;
Failed IR Rules (3) of Methods (3)
----------------------------------
1) Method "public void compiler.vectorization.TestPopulateIndex.exprWithIndex1()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
2) Method "public void compiler.vectorization.TestPopulateIndex.exprWithIndex2()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
3) Method "public void compiler.vectorization.TestPopulateIndex.indexArrayFill()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
--------------------
And this:
case Op_AbsVF:
case Op_NegVF:
if ((vlen == 16) && (VM_Version::supports_avx512dq() == false)) {
return false; // 512bit vandps and vxorps are not available
}
break;
case Op_AbsVD:
case Op_NegVD:
if ((vlen == 8) && (VM_Version::supports_avx512dq() == false)) {
return false; // 512bit vpmullq, vandpd and vxorpd are not available
}
break;
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "public static float compiler.loopopts.superword.SumRedAbsNeg_Float.sumReductionImplement(float[],float[],float[],float)" - [Failed IR rules: 1]:
* @IR rule 2: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={}, applyIf={}, applyIfCPUFeature={"sse2", "true"}, counts={"_#ADD_REDUCTION_VF#_", ">= 1", "_#ABS_V#_", ">= 1", "_#NEG_V#_", ">= 1"}, failOn={}, applyIfAnd={"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(AddReductionVF.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 2: "(\\d+(\\s){2}(AbsV(B|S|I|L|F|D).*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 3: "(\\d+(\\s){2}(NegV(F|D).*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
I have some examples that:
AVX2: vectorize
KNL: do NOT vectorize -> why does it not just make use of its AVX2 capabilities?
AVX512: vectorize
I have not tested this on a proper KNL machine, but when I whitelist the KNL setting in the IR-Framework (see JDK-8309183), then some IR-tests begin to fail. And I think that is a clear indicator that the tests would fail on actual KNL machines.
--------------------------------------
For example, I extracted compiler.loopopts.superword.TestGeneralizedReductions.testMapReductionOnGlobalAccumulator:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:+UseKNLSetting Test1.java
Unimplemented
497 PopCountL === _ 498 [[ 496 ]] Type:int !orig=418,356,126 !jvms: Test1::test @ bci:18 (line 14)
The pack has 8 ops.
But if I run:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:UseAVX=3 Test1.java
Then the pack with 8 PopCountL seems to create no issues, we vectorize.
Likewise, with:
./java -Xbatch -XX:CompileCommand=compileonly,Test1::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose -XX:UseAVX=2 Test1.java
I get a pack of 4 PopCountL, and that vectorizes.
The problem is in src/hotspot/cpu/x86/x86.ad
case Op_PopCountVI:
case Op_PopCountVL: {
if (!is_pop_count_instr_target(bt) &&
(size_in_bits == 512) && !VM_Version::supports_avx512bw()) {
return false;
}
}
The issue is that we have allowed the packing of 8 longs, which makes 512 bit. But under KNL we have no avx512bw support. So we say "unimplemented" and reject the packing, and end up with no vectorization. But it would have been nice to instead step down to AVX2, and pack 4 longs. Because that should be able to vectorize!
One solution: we could retry vectorization at a smaller MaxVectorSize if it fails. I have seen some other compilers do that.
Another option: find the maximal vector width per instruction at the beginning of SuperWord, and limit the vectorization to the smallest one we find.
--------- List of failures below, may not be exhausive ---------------
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "private static long compiler.loopopts.superword.TestGeneralizedReductions.testMapReductionOnGlobalAccumulator(long[])" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={}, applyIf={}, applyIfCPUFeature={"avx2", "true"}, counts={"_#ADD_REDUCTION_VI#_", ">= 1", "_#POPCOUNT_VL#_", ">= 1"}, failOn={}, applyIfAnd={"SuperWordReductions", "true", "UsePopCountInstruction", "true"}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(AddReductionVI.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 2: "(\\d+(\\s){2}(PopCountVL.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
Reason:
case Op_PopCountVI:
case Op_PopCountVL: {
if (!is_pop_count_instr_target(bt) &&
(size_in_bits == 512) && !VM_Version::supports_avx512bw()) {
return false;
}
}
---------------------------------
I see similar failures like this, with KNL:
case Op_PopulateIndex:
if (size_in_bits > 256 && !VM_Version::supports_avx512bw()) {
return false;
}
break;
Failed IR Rules (3) of Methods (3)
----------------------------------
1) Method "public void compiler.vectorization.TestPopulateIndex.exprWithIndex1()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
2) Method "public void compiler.vectorization.TestPopulateIndex.exprWithIndex2()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
3) Method "public void compiler.vectorization.TestPopulateIndex.indexArrayFill()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#POPULATE_INDEX#_", "> 0"}, applyIfAnd={}, failOn={}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(PopulateIndex.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 > 0 [given]
- No nodes matched!
--------------------
And this:
case Op_AbsVF:
case Op_NegVF:
if ((vlen == 16) && (VM_Version::supports_avx512dq() == false)) {
return false; // 512bit vandps and vxorps are not available
}
break;
case Op_AbsVD:
case Op_NegVD:
if ((vlen == 8) && (VM_Version::supports_avx512dq() == false)) {
return false; // 512bit vpmullq, vandpd and vxorpd are not available
}
break;
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "public static float compiler.loopopts.superword.SumRedAbsNeg_Float.sumReductionImplement(float[],float[],float[],float)" - [Failed IR rules: 1]:
* @IR rule 2: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={}, applyIf={}, applyIfCPUFeature={"sse2", "true"}, counts={"_#ADD_REDUCTION_VF#_", ">= 1", "_#ABS_V#_", ">= 1", "_#NEG_V#_", ">= 1"}, failOn={}, applyIfAnd={"SuperWordReductions", "true", "LoopMaxUnroll", ">= 8"}, applyIfOr={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 1: "(\\d+(\\s){2}(AddReductionVF.*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 2: "(\\d+(\\s){2}(AbsV(B|S|I|L|F|D).*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
* Constraint 3: "(\\d+(\\s){2}(NegV(F|D).*)+(\\s){2}===.*)"
- Failed comparison: [found] 0 >= 1 [given]
- No nodes matched!
- blocks
-
JDK-8309183 [IR Framework] Add UseKNLSetting to whitelist
- In Progress
- duplicates
-
JDK-8326139 C2 SuperWord: split packs (match use/def packs, implemented, mutual independence)
- Resolved
- relates to
-
JDK-8278920 [vectorapi] IR tests fail with -XX:+UseKNLSetting and 512-bit vectors loaded from byte[]
- Closed