Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8209862

CipherCore performance improvement


    • b16
    • generic
    • generic

        Please, consider performance improvement for CipherCore.

        https://bugs.openjdk.java.net/browse/JDK-8207775 add required data zeroing. That causes massive performance regression:
        Regressions caused by JDK-8207775
         (Legend: <algorithm> <keyLength>/<dataSize> <regression Lin64>/<regression Win64>)
        AES/CBC/NoPadding___ 128/01024 -17.4% / -3.9%
        AES/CBC/NoPadding___ 128/16384 -3.8% / -4.3%
        AES/CBC/PKCS5Padding 128/16384 -8.2% / -6.0%
        AES/ECB/NoPadding___ 128/01024 -7.3% / -7.6%
        AES/ECB/PKCS5Padding 128/16384 0 / -8.6%

        AES/GCM/NoPadding 128/01024 -4.4% / -3.9%

        AES/CBC/PKCS5Padding 128/16384 0 / -2.60%

        DESede/CBC/NoPadding___ 168/16384 0 / -7.20%
        DESede/CBC/PKCS5Padding 168/16384 0 / -3.70%

        DESede/ECB/NoPadding___ 168/16384 0 / -7.30%

        In general negative performance effect caused by zeroing can't avoided. But in some cases, CipherCore can be optimized.
        Here is list of performance speedup by suggested patch:
        Performance improvements by suggested modification
        (Legend: <algorithm> <keyLength>/<dataSize> <speedup Lin64>/<speedup Win64>)
        AES/CBC/NoPadding___ 128/_1024 68.10% / 40.20%
        AES/CBC/NoPadding___ 128/16384 52.20% / 79.10%
        AES/CBC/PKCS5Padding 128/16384 38.70% / 72.60%
        AES/ECB/NoPadding___ 128/_1024 29.40% / 23.90%
        AES/ECB/NoPadding___ 128/16384 11.60% / 33.50%
        AES/ECB/PKCS5Padding 128/16384 15.30% / 38.30%

        AES/GCM/NoPadding___ 128/_1024 7.10% / 7.10%
        AES/GCM/NoPadding___ 128/16384 9.20% / 2.10%
        AES/GCM/PKCS5Padding 128/16384 9.00% / 0

        AES/CBC/PKCS5Padding 128/16384 2.50% / 0
        AES/ECB/NoPadding___ 128/_1024 0 / 10.50%

        DESede/CBC/PKCS5Padding 168/16384 0 / 3.40%
        DESede/ECB/NoPadding___ 168/16384 4.00% / 4.40%
        DESede/ECB/PKCS5Padding 168/16384 0 / 5.00%

        DESede/ECB/NoPadding___ 168/16384 6.50% / 0
        DESede/CBC/PKCS5Padding 168/16384 3.90% / 4.10%

        That not only covers almost all regression caused by additional zeroing, but gives additional performance benefits.

        The idea of the modification:
        - CipherCore contains 2 methods:
          doFinal(byte[], int, int)
          doFinal(byte[], int, int, byte[], int )
          The first method allocates output array internally and invokes the second doFinal.
        - At the same time, the second doFinal method contains a lot of checks and additional actions to work properly with user-provider output array. All these actions may be avoided if output array was allocated internally.

        What was done:
        - Some parts of the code (which can't be eliminated by knowing output array details) from method doFinal(byte[], int, int, byte[], int) were extracted to other methods (checkReinit(),prepareInputBuffer(),checkOutputCapacity()).
        - doFinal(byte[], int, int, byte[], int ) was manually inlined to doFinal(byte[], int, int).
        - massive manual constant propagation and dead code elimination (I have to note that hotspot JIT is unable to perform all such optimizations, JIT doesn't have enough information).

        The key performance factor here is not elimination of some checks. But the fact that we can avoid unnecessary data copying and corresponds zeroing.

              coffeys Sean Coffey
              skuksenko Sergey Kuksenko
              0 Vote for this issue
              3 Start watching this issue
