Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8252204

AArch64: Implement SHA3 accelerator/intrinsic

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Fixed
    • P4
    • 16
    • 16
    • hotspot
    • b22
    • aarch64
    • generic

    Description

      The existing implementation of ` sun.security.provider.SHA3::implCompress` is done in Java.
      When compiled to native code, the C2 JIT code does not vectorize well and it contains some Java-isms like array-bound checks.
      That hurts performance for an algorithm that can be used to validate large amounts of data.

      Armv8.2 optionally provides "ARMv8.2-SHA, SHA2-512 and SHA3 functionality".
      To speed it up, we could simply create an intrinsic that implements the algorithm using the new SIMD instructions:
       - EOR3 Three-way Exclusive OR (page C7-1479)
       - RAX1 Rotate and Exclusive OR (page C7-1892)
       - XAR Exclusive OR and Rotate (page C7-2303)
       - BCAX Bit Clear and Exclusive OR (page C7-1418)

      This would also help eliminate the Java-isms.
      That is a similar approach to intrinsics for SHA1/SHA256/SHA512.

      Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions:
              https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52

      Initial implementation: http://cr.openjdk.java.net/~fyang/8252204/webrev.00/

      With a cycle-accurate aarch64 simulator, we tested test/micro/org/openjdk/bench/java/security/MessageDigests.java for performance gain. We witnessed 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message.

      Attachments

        Issue Links

          Activity

            People

              fyang Fei Yang
              fyang Fei Yang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: