On various microbenchmarks we've seen that Arrays.mismatch slightly underperform naive loops for small array inputs, both on x86 and aarch64. Since small arrays may be more common this hampers adoption in performance-sensitive locales that want to take more explicit advantage of mismatch vectorization.
Profiling indicates this boils down to taking a number of added branches outside of the intrinsified Arrays.vectorizedMismatch method. Simplifying the outside logic and leaning more on the intrinsified code seem to be a win on x86 and aarch64.
Profiling indicates this boils down to taking a number of added branches outside of the intrinsified Arrays.vectorizedMismatch method. Simplifying the outside logic and leaning more on the intrinsified code seem to be a win on x86 and aarch64.