Intrinsic for Math.copySign is disabled on x86_64.
We can improve on generated c2 instructions for float and double, and this change adds optimized intrinsics for float and double Math.copySign.
Math.copySign(double)
From:
0x00007f7d606e5dac: vmovq %xmm1,%r10
0x00007f7d606e5db1: vmovq %xmm0,%r11
0x00007f7d606e5db6: movabs $0x7fffffffffffffff,%r8
0x00007f7d606e5dc0: and %r8,%r11
0x00007f7d606e5dc3: movabs $0x8000000000000000,%r8
0x00007f7d606e5dcd: and %r8,%r10
0x00007f7d606e5dd0: or %r11,%r10
0x00007f7d606e5dd3: vmovq %r10,%xmm0
To:
0x00007fc3c14c63ac: movabs $0x7fffffffffffffff,%r10
0x00007fc3c14c63b6: vmovq %r10,%xmm2
0x00007fc3c14c63bb: vpternlogq $0xe4,%xmm2,%xmm1,%xmm0
Math.copySign(float)
From:
0x00007ff8886e60ac: vmovd %xmm1,%r11d
0x00007ff8886e60b1: vmovd %xmm0,%r10d
0x00007ff8886e60b6: and $0x80000000,%r11d
0x00007ff8886e60bd: and $0x7fffffff,%r10d
0x00007ff8886e60c4: or %r10d,%r11d
0x00007ff8886e60c7: vmovd %r11d,%xmm0
To:
0x00007fc7d94c63ac: mov $0x7fffffff,%r10d
0x00007fc7d94c63b2: vmovd %r10d,%xmm3
0x00007fc7d94c63b7: vpternlogd $0xe4,%xmm3,%xmm1,%xmm0
Performance of patch using updated test/micro/org/openjdk/bench/vm/compiler/Signum.java:
BEFORE
Signum._5_copySignFloatTest avgt 5 2.442 ? 0.024 ns/op
Signum._7_copySignDoubleTest avgt 5 2.400 ? 0.033 ns/op
PATCH
Signum._5_copySignFloatTest avgt 5 2.029 ? 0.011 ns/op
Signum._7_copySignDoubleTest avgt 5 2.029 ? 0.024 ns/op
JTREG that covers this case:
test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java
We can improve on generated c2 instructions for float and double, and this change adds optimized intrinsics for float and double Math.copySign.
Math.copySign(double)
From:
0x00007f7d606e5dac: vmovq %xmm1,%r10
0x00007f7d606e5db1: vmovq %xmm0,%r11
0x00007f7d606e5db6: movabs $0x7fffffffffffffff,%r8
0x00007f7d606e5dc0: and %r8,%r11
0x00007f7d606e5dc3: movabs $0x8000000000000000,%r8
0x00007f7d606e5dcd: and %r8,%r10
0x00007f7d606e5dd0: or %r11,%r10
0x00007f7d606e5dd3: vmovq %r10,%xmm0
To:
0x00007fc3c14c63ac: movabs $0x7fffffffffffffff,%r10
0x00007fc3c14c63b6: vmovq %r10,%xmm2
0x00007fc3c14c63bb: vpternlogq $0xe4,%xmm2,%xmm1,%xmm0
Math.copySign(float)
From:
0x00007ff8886e60ac: vmovd %xmm1,%r11d
0x00007ff8886e60b1: vmovd %xmm0,%r10d
0x00007ff8886e60b6: and $0x80000000,%r11d
0x00007ff8886e60bd: and $0x7fffffff,%r10d
0x00007ff8886e60c4: or %r10d,%r11d
0x00007ff8886e60c7: vmovd %r11d,%xmm0
To:
0x00007fc7d94c63ac: mov $0x7fffffff,%r10d
0x00007fc7d94c63b2: vmovd %r10d,%xmm3
0x00007fc7d94c63b7: vpternlogd $0xe4,%xmm3,%xmm1,%xmm0
Performance of patch using updated test/micro/org/openjdk/bench/vm/compiler/Signum.java:
BEFORE
Signum._5_copySignFloatTest avgt 5 2.442 ? 0.024 ns/op
Signum._7_copySignDoubleTest avgt 5 2.400 ? 0.033 ns/op
PATCH
Signum._5_copySignFloatTest avgt 5 2.029 ? 0.011 ns/op
Signum._7_copySignDoubleTest avgt 5 2.029 ? 0.024 ns/op
JTREG that covers this case:
test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java
- relates to
-
JDK-8251525 AArch64: Faster Math.signum(fp)
-
- Resolved
-
-
JDK-8265491 Math Signum optimization for x86
-
- Resolved
-