NEON's BCAX has been in service for SHA3 intrinsic implementation (https://github.com/openjdk/jdk/blob/a13af650437de508d64f0b12285a6ffc9901f85f/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3934 ). As well it can be generated by code generator if we matched such patterns.
Test case:
public static void testBCAXI() {
for (int i = 0; i < i128specs.loopBound(ARR_LEN); i += i128specs.length()) {
IntVector v1 = IntVector.fromArray(i128specs, iarr, i);
IntVector v2 = IntVector.fromArray(i128specs, iarr, i);
IntVector v3 = IntVector.fromArray(i128specs, iarr, i);
v3.lanewise(VectorOperators.NOT).lanewise(VectorOperators.AND, v2).
lanewise(VectorOperators.XOR, v1).intoArray(ir, i);
}
}
Generated code:
0x0000fffe18c8ede8: ldr q16, [x11, #16]
0x0000fffe18c8edec: bic v17.16b, v16.16b, v16.16b
0x0000fffe18c8edf0: eor v16.16b, v17.16b, v16.16b
0x0000fffe18c8edf4: add x10, x17, x10
0x0000fffe18c8edf8: str q16, [x10, #16]
BIC and EOR can be optimized to BCAX.
Test case:
public static void testBCAXI() {
for (int i = 0; i < i128specs.loopBound(ARR_LEN); i += i128specs.length()) {
IntVector v1 = IntVector.fromArray(i128specs, iarr, i);
IntVector v2 = IntVector.fromArray(i128specs, iarr, i);
IntVector v3 = IntVector.fromArray(i128specs, iarr, i);
v3.lanewise(VectorOperators.NOT).lanewise(VectorOperators.AND, v2).
lanewise(VectorOperators.XOR, v1).intoArray(ir, i);
}
}
Generated code:
0x0000fffe18c8ede8: ldr q16, [x11, #16]
0x0000fffe18c8edec: bic v17.16b, v16.16b, v16.16b
0x0000fffe18c8edf0: eor v16.16b, v17.16b, v16.16b
0x0000fffe18c8edf4: add x10, x17, x10
0x0000fffe18c8edf8: str q16, [x10, #16]
BIC and EOR can be optimized to BCAX.
- links to
-
Review openjdk/jdk/13222