** Problem Description
We found one defect in adlc/dfa, in particular the child constraints
generated for vector unary operations.
Different from unpredicated vector unary operation, which has only one
child, predicated vector unary operation works like a binary node with
two children in fact. In this way, the child constraint generated for
*predicated vector unary operation* is the super set of that generated
for the *unpredicated* version. As a result, there exists a risk for
predicated vector unary operations to match the unpredicated rules by
accident.
Currently both SVE in AArch64 and AVX512 in x64 can bypass such
failures. 1) One extra predicate constraint is added by AArch64, and 2)
lower instruction cost is set for predicated rules in x64.
Even so, we still think it's a time bomb and we should fix it.
** PoC
Here is a PoC. See https://github.com/shqking/jdk/commit/50ec9b1923bedd89854366f2a95fec1ff3e6c787
In order to expose the failure, we made minor updates in this PoC, 1)
setting the operand cost of pRegGov as the default value, and 2)
removing the extra predicate constraint in vabsI() rule.
Take the AArch64 SVE rules for AbsVI node as an example. For the
unpredicate rule vabsI() and the predicated version vabsI_masked(), the
"match" statements are quite similar except that "pg" is treated as the
right child.
With this PoC, the dfa state generated for AbsVI is shown as below.
The log is simplified.
```
void State::_sub_Op_AbsVI(const Node *n){
if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && <---- 1
( UseSVE > 0 ) )
{
unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; <---- 2
DFA_PRODUCTION(VREG, vabsI_masked_rule, c)
}
if( STATE__VALID_CHILD(_kids[0], VREG) && <---- 3
( UseSVE > 0) )
{
unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; <---- 4
if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { <---- 5
DFA_PRODUCTION(VREG, vabsI_rule, c)
}
}
...
```
1) Child constraint for predicated version at line 1 contains that for
unpredicated version at line 3.
2) The two predicate constraints are same, mainly because we removed the
extra check for vabsI() rule in this PoC.
Hence both the two if-stmts at line 1 and line 3 would be matched for
predicated version, and the lower cost production would be finally
selected at line 5. Since we modified the operand cost for pRegGov, the
cost of predicated version at line 2 is greater than that of
unpredicatred version at line 4.
As a result, vabsI() rule would be matched for predicated version, which
is not our intention. In my local test on one SVE machine, jtreg test
case jdk/incubator/vector/IntMaxVectorTests.java failed.
** Impact
All vector unary operations for which the predicated operation is
available, are affected. Here is the list.
```
AbsV
NegV
SqrtV
PopCountV
CountLeadingZerosV
CountTrailingZerosV
ReverseV
ReverseBytesV
MaskAll
VectorLoadMask
VectorMaskFirstTrue
```
Besides AArch64 SVE, x64 AVX-512 is affected as well. This PoC also
made minor update to x86.ad, increasing the cost for vabs_masked(). In
my local test, unexpected rule is matched and the jtreg test failed as
well.
We argue that
1) Mitigations done by AArch64 and x64 can bypass the potential matching
failure, but we don't think they fundamentally resolve the problem.
2) The root cause lies in that the child constraints generated for
predicated and unpredicated vector unary operations should be exclusive.
We found one defect in adlc/dfa, in particular the child constraints
generated for vector unary operations.
Different from unpredicated vector unary operation, which has only one
child, predicated vector unary operation works like a binary node with
two children in fact. In this way, the child constraint generated for
*predicated vector unary operation* is the super set of that generated
for the *unpredicated* version. As a result, there exists a risk for
predicated vector unary operations to match the unpredicated rules by
accident.
Currently both SVE in AArch64 and AVX512 in x64 can bypass such
failures. 1) One extra predicate constraint is added by AArch64, and 2)
lower instruction cost is set for predicated rules in x64.
Even so, we still think it's a time bomb and we should fix it.
** PoC
Here is a PoC. See https://github.com/shqking/jdk/commit/50ec9b1923bedd89854366f2a95fec1ff3e6c787
In order to expose the failure, we made minor updates in this PoC, 1)
setting the operand cost of pRegGov as the default value, and 2)
removing the extra predicate constraint in vabsI() rule.
Take the AArch64 SVE rules for AbsVI node as an example. For the
unpredicate rule vabsI() and the predicated version vabsI_masked(), the
"match" statements are quite similar except that "pg" is treated as the
right child.
With this PoC, the dfa state generated for AbsVI is shown as below.
The log is simplified.
```
void State::_sub_Op_AbsVI(const Node *n){
if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && <---- 1
( UseSVE > 0 ) )
{
unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; <---- 2
DFA_PRODUCTION(VREG, vabsI_masked_rule, c)
}
if( STATE__VALID_CHILD(_kids[0], VREG) && <---- 3
( UseSVE > 0) )
{
unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; <---- 4
if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { <---- 5
DFA_PRODUCTION(VREG, vabsI_rule, c)
}
}
...
```
1) Child constraint for predicated version at line 1 contains that for
unpredicated version at line 3.
2) The two predicate constraints are same, mainly because we removed the
extra check for vabsI() rule in this PoC.
Hence both the two if-stmts at line 1 and line 3 would be matched for
predicated version, and the lower cost production would be finally
selected at line 5. Since we modified the operand cost for pRegGov, the
cost of predicated version at line 2 is greater than that of
unpredicatred version at line 4.
As a result, vabsI() rule would be matched for predicated version, which
is not our intention. In my local test on one SVE machine, jtreg test
case jdk/incubator/vector/IntMaxVectorTests.java failed.
** Impact
All vector unary operations for which the predicated operation is
available, are affected. Here is the list.
```
AbsV
NegV
SqrtV
PopCountV
CountLeadingZerosV
CountTrailingZerosV
ReverseV
ReverseBytesV
MaskAll
VectorLoadMask
VectorMaskFirstTrue
```
Besides AArch64 SVE, x64 AVX-512 is affected as well. This PoC also
made minor update to x86.ad, increasing the cost for vabs_masked(). In
my local test, unexpected rule is matched and the jtreg test failed as
well.
We argue that
1) Mitigations done by AArch64 and x64 can bypass the potential matching
failure, but we don't think they fundamentally resolve the problem.
2) The root cause lies in that the child constraints generated for
predicated and unpredicated vector unary operations should be exclusive.