Current implementation for these two APIs on AArch64 SVE is not efficient enough. SVE does not support naive predicate instructions for these two APIs. Instead, they are now implemented with pure vector instructions. However, the output of "fromLong()" and input of "toLong" are defined as the mask with predicate register on SVE architectures. Hence, for API "fromLong", it needs to generate a vector mask stored in a vector register, and then convert it to the predicate at the end. The opposite action is needed for "toLong" at the start of the backend code generation.
These conversions have higher cost and are implemented in the IR's backend codegen part, which is much more in-efficient and influences the performance of these two APIs.
Consider it has two IRs in C2 to do the conversion specially (e.g. VectorLoadMask/VectorStoreMask), we can move these part from backend and to IR-level. This also matches with current IR pattern for these two APIs on architectures that do not support the predicate feature. Additionally, some mid-end IR optimizations can also be shared.
These conversions have higher cost and are implemented in the IR's backend codegen part, which is much more in-efficient and influences the performance of these two APIs.
Consider it has two IRs in C2 to do the conversion specially (e.g. VectorLoadMask/VectorStoreMask), we can move these part from backend and to IR-level. This also matches with current IR pattern for these two APIs on architectures that do not support the predicate feature. Additionally, some mid-end IR optimizations can also be shared.