The implementation of vector mask casting is a little mess now. The "VectorMaskCast" op is generated if:
1) the current platform supports the predicated feature
2) the element size (in bytes) of the src and dst type is the same
Otherwise, different ops will be generated for different type casting. For example, "VectorMaskCast + VectorCast"
is generated from the floating point type to an integral type, but the element size is different. And "VectorMaskCast + VectorCast + VectorMaskCast" pattern is generated from a floating point type to another floating point type, but the element size is different.
Since "VectorMaskCast" is different from VectorCast, which only needs extending or narrowing for the conversion, some architectures may have cheaper implementation than VectorCast like x86 avx2. So to make the codes clean and improve the performance for mask casting on some architectures, we can always generate the VectorMaskCast op for all cases and all platforms.
1) the current platform supports the predicated feature
2) the element size (in bytes) of the src and dst type is the same
Otherwise, different ops will be generated for different type casting. For example, "VectorMaskCast + VectorCast"
is generated from the floating point type to an integral type, but the element size is different. And "VectorMaskCast + VectorCast + VectorMaskCast" pattern is generated from a floating point type to another floating point type, but the element size is different.
Since "VectorMaskCast" is different from VectorCast, which only needs extending or narrowing for the conversion, some architectures may have cheaper implementation than VectorCast like x86 avx2. So to make the codes clean and improve the performance for mask casting on some architectures, we can always generate the VectorMaskCast op for all cases and all platforms.