In aarch64, current implemention of floating-point Min/MaxReductionV with 2 elements can be optimized via fminp/fmaxp instructions.
Take `Set dst (MaxReductionV dsrc vsrc)` as an example:
---------- now ---------
fmaxs $dst, $dsrc, $vsrc
ins $tmp, S, $vsrc, 0, 1
fmaxs $dst, $dst, $tmp
-------- optimized -----
fmaxp $dst, $vsrc, D
fmaxs $dst, $dst, $dsrc
Witnessed about 25% improvements with an intitial implementation on an A72-based aarch64 server.
Take `Set dst (MaxReductionV dsrc vsrc)` as an example:
---------- now ---------
fmaxs $dst, $dsrc, $vsrc
ins $tmp, S, $vsrc, 0, 1
fmaxs $dst, $dst, $tmp
-------- optimized -----
fmaxp $dst, $vsrc, D
fmaxs $dst, $dst, $dsrc
Witnessed about 25% improvements with an intitial implementation on an A72-based aarch64 server.
- relates to
-
JDK-8259629 aarch64 builds fail after JDK-8258932
- Resolved