-
Enhancement
-
Resolution: Fixed
-
P4
-
21
-
b16
-
x86
When the vector length is 128 bits, we can use vpermilps instead of vpermd with the extended 256-bit vectors. This also helps cover the 128-bit vector situation with AVX1.