Description
The existing implementation of ` sun.security.provider.SHA3::implCompress` is done in Java.
When compiled to native code, the C2 JIT code does not vectorize well and it contains some Java-isms like array-bound checks.
That hurts performance for an algorithm that can be used to validate large amounts of data.
Armv8.2 optionally provides "ARMv8.2-SHA, SHA2-512 and SHA3 functionality".
To speed it up, we could simply create an intrinsic that implements the algorithm using the new SIMD instructions:
- EOR3 Three-way Exclusive OR (page C7-1479)
- RAX1 Rotate and Exclusive OR (page C7-1892)
- XAR Exclusive OR and Rotate (page C7-2303)
- BCAX Bit Clear and Exclusive OR (page C7-1418)
This would also help eliminate the Java-isms.
That is a similar approach to intrinsics for SHA1/SHA256/SHA512.
Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52
Initial implementation: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/
With a cycle-accurate aarch64 simulator, we tested test/micro/org/openjdk/bench/java/security/MessageDigests.java for performance gain. We witnessed 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message.
When compiled to native code, the C2 JIT code does not vectorize well and it contains some Java-isms like array-bound checks.
That hurts performance for an algorithm that can be used to validate large amounts of data.
Armv8.2 optionally provides "ARMv8.2-SHA, SHA2-512 and SHA3 functionality".
To speed it up, we could simply create an intrinsic that implements the algorithm using the new SIMD instructions:
- EOR3 Three-way Exclusive OR (page C7-1479)
- RAX1 Rotate and Exclusive OR (page C7-1892)
- XAR Exclusive OR and Rotate (page C7-2303)
- BCAX Bit Clear and Exclusive OR (page C7-1418)
This would also help eliminate the Java-isms.
That is a similar approach to intrinsics for SHA1/SHA256/SHA512.
Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52
Initial implementation: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/
With a cycle-accurate aarch64 simulator, we tested test/micro/org/openjdk/bench/java/security/MessageDigests.java for performance gain. We witnessed 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message.
Attachments
Issue Links
- duplicates
-
JDK-8244048 AArch64: Implement C2 optimizations using "SHA3" instructions.
- Closed
- relates to
-
JDK-8295698 AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics
- Resolved
-
JDK-8309109 AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1
- Resolved
-
JDK-8337666 AArch64: SHA3 GPR intrinsic
- Open
-
JDK-8292894 AArch64: Enable SHA3 intrinsic by default on supported hardware
- Closed
(2 links to)