Vector API test operations (IS_DEFAULT, IS_FINITE, IS_INFINITE, IS_NAN and IS_NEGATIVE) are computed in three steps:
1) reinterpreting the floating point vectors as integral vectors (int/long)
2) perform the test in integer domain to get a int/long mask
3) reinterpret the int/long mask as float/double mask
Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic.
For the attached VectorTestPerf, the performance improves as follows:
Base:
Benchmark (size) Mode Cnt Score Error Units
VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ± 90.452 ops/ms
VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ± 91.685 ops/ms
VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ± 83.890 ops/ms
VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ± 70.629 ops/ms
VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ± 79.806 ops/ms
With patch:
Benchmark (size) Mode Cnt Score Error Units
VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ± 40.477 ops/ms
VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ± 296.622 ops/ms
VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ± 269.988 ops/ms
VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ± 498.649 ops/ms
VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ± 11.849 ops/ms
1) reinterpreting the floating point vectors as integral vectors (int/long)
2) perform the test in integer domain to get a int/long mask
3) reinterpret the int/long mask as float/double mask
Step 3) currently is very slow. It can be optimized by modifying the Java code to utilize the existing reinterpret intrinsic.
For the attached VectorTestPerf, the performance improves as follows:
Base:
Benchmark (size) Mode Cnt Score Error Units
VectorTestPerf.IS_DEFAULT 1024 thrpt 5 223.156 ± 90.452 ops/ms
VectorTestPerf.IS_FINITE 1024 thrpt 5 223.841 ± 91.685 ops/ms
VectorTestPerf.IS_INFINITE 1024 thrpt 5 224.561 ± 83.890 ops/ms
VectorTestPerf.IS_NAN 1024 thrpt 5 223.777 ± 70.629 ops/ms
VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 218.392 ± 79.806 ops/ms
With patch:
Benchmark (size) Mode Cnt Score Error Units
VectorTestPerf.IS_DEFAULT 1024 thrpt 5 8812.357 ± 40.477 ops/ms
VectorTestPerf.IS_FINITE 1024 thrpt 5 7425.739 ± 296.622 ops/ms
VectorTestPerf.IS_INFINITE 1024 thrpt 5 8932.730 ± 269.988 ops/ms
VectorTestPerf.IS_NAN 1024 thrpt 5 8574.872 ± 498.649 ops/ms
VectorTestPerf.IS_NEGATIVE 1024 thrpt 5 8838.400 ± 11.849 ops/ms