To avoid dead code elimination, a use-point laneIsSet() is added in
each benchmark method in MaskFromLongBenchmark.java.
However, currently laneIsSet() [1] is implemented by toLong(). So it may
generate a fromLong-toLong pair [2], making this benchmark to be
noneffective after inlining laneIsSet() into the outer method. The
assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We
cannot see the bdep instruction used by fromLong on AArch64 [4].
So, in this case, we cannot measure fromLong()'s performance by
using this benchmark.
[1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70
[2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736
[3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa
[4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099
each benchmark method in MaskFromLongBenchmark.java.
However, currently laneIsSet() [1] is implemented by toLong(). So it may
generate a fromLong-toLong pair [2], making this benchmark to be
noneffective after inlining laneIsSet() into the outer method. The
assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We
cannot see the bdep instruction used by fromLong on AArch64 [4].
So, in this case, we cannot measure fromLong()'s performance by
using this benchmark.
[1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70
[2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736
[3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa
[4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099