Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8209862

CipherCore performance improvement

XMLWordPrintable

    • b16
    • generic
    • generic

        Please, consider performance improvement for CipherCore.
        http://cr.openjdk.java.net/~skuksenko/crypto/8209862/

        Preface.
        https://bugs.openjdk.java.net/browse/JDK-8207775 add required data zeroing. That causes massive performance regression:
        Regressions caused by JDK-8207775
         (Legend: <algorithm> <keyLength>/<dataSize> <regression Lin64>/<regression Win64>)
        AESBench.decrypt
        AES/CBC/NoPadding___ 128/01024 -17.4% / -3.9%
        AES/CBC/NoPadding___ 128/16384 -3.8% / -4.3%
        AES/CBC/PKCS5Padding 128/16384 -8.2% / -6.0%
        AES/ECB/NoPadding___ 128/01024 -7.3% / -7.6%
        AES/ECB/PKCS5Padding 128/16384 0 / -8.6%

        AESGSMBench.decrypt
        AES/GCM/NoPadding 128/01024 -4.4% / -3.9%

        AESBench.encrypt
        AES/CBC/PKCS5Padding 128/16384 0 / -2.60%

        DESedeBench.decrypt
        DESede/CBC/NoPadding___ 168/16384 0 / -7.20%
        DESede/CBC/PKCS5Padding 168/16384 0 / -3.70%

        DESedeBench.encrypt
        DESede/ECB/NoPadding___ 168/16384 0 / -7.30%

        In general negative performance effect caused by zeroing can't avoided. But in some cases, CipherCore can be optimized.
        Here is list of performance speedup by suggested patch:
        Performance improvements by suggested modification
        (Legend: <algorithm> <keyLength>/<dataSize> <speedup Lin64>/<speedup Win64>)
        AESBench.decrypt
        AES/CBC/NoPadding___ 128/_1024 68.10% / 40.20%
        AES/CBC/NoPadding___ 128/16384 52.20% / 79.10%
        AES/CBC/PKCS5Padding 128/16384 38.70% / 72.60%
        AES/ECB/NoPadding___ 128/_1024 29.40% / 23.90%
        AES/ECB/NoPadding___ 128/16384 11.60% / 33.50%
        AES/ECB/PKCS5Padding 128/16384 15.30% / 38.30%

        AESGSMBench.decrypt
        AES/GCM/NoPadding___ 128/_1024 7.10% / 7.10%
        AES/GCM/NoPadding___ 128/16384 9.20% / 2.10%
        AES/GCM/PKCS5Padding 128/16384 9.00% / 0

        AESBench.encrypt
        AES/CBC/PKCS5Padding 128/16384 2.50% / 0
        AES/ECB/NoPadding___ 128/_1024 0 / 10.50%

        DESedeBench.decrypt
        DESede/CBC/PKCS5Padding 168/16384 0 / 3.40%
        DESede/ECB/NoPadding___ 168/16384 4.00% / 4.40%
        DESede/ECB/PKCS5Padding 168/16384 0 / 5.00%

        DESedeBench.encrypt
        DESede/ECB/NoPadding___ 168/16384 6.50% / 0
        DESede/CBC/PKCS5Padding 168/16384 3.90% / 4.10%

        That not only covers almost all regression caused by additional zeroing, but gives additional performance benefits.

        The idea of the modification:
        - CipherCore contains 2 methods:
          doFinal(byte[], int, int)
          doFinal(byte[], int, int, byte[], int )
          The first method allocates output array internally and invokes the second doFinal.
        - At the same time, the second doFinal method contains a lot of checks and additional actions to work properly with user-provider output array. All these actions may be avoided if output array was allocated internally.

        What was done:
        - Some parts of the code (which can't be eliminated by knowing output array details) from method doFinal(byte[], int, int, byte[], int) were extracted to other methods (checkReinit(),prepareInputBuffer(),checkOutputCapacity()).
        - doFinal(byte[], int, int, byte[], int ) was manually inlined to doFinal(byte[], int, int).
        - massive manual constant propagation and dead code elimination (I have to note that hotspot JIT is unable to perform all such optimizations, JIT doesn't have enough information).

        The key performance factor here is not elimination of some checks. But the fact that we can avoid unnecessary data copying and corresponds zeroing.

              coffeys Sean Coffey
              skuksenko Sergey Kuksenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: