Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8253821

Improve ByteBuffer performance with GCM

    XMLWordPrintable

Details

    Description

      There were two areas of focus, the primary is when direct bytebuffers are used with some crypto algorithms, data is copied to byte arrays numerous times, causing unnecessary memory allocation and bringing down performance. The other focus was the non-direct bytebuffer output arrays.

      This change comes in multiple parts:
      1) AESCipher has a one-size-fits-all approach to bytebuffers. All encryption and decryption is done in byte arrays. When the input data is a byte array or a bytebuffer backed by a byte array, this is ok. However when it is a direct buffer, the data is copied into a new byte array. Unfortunately, this hurts SSLEngine which uses direct buffers causing multiple copies of data down to the raw algorithm. Additionally GCM code and other related classes had to be changed to allow ByteBuffers down to the algorithm where it can be copied into a fixed size byte array that can be reused. Code without this modifications running JFR with Flink, a performance test, shows ~150GB of byte array allocation in one minute of operation, afterward 7GB.
      2) GCM needed some reworking of the logic. Being an authenticated cipher, if the GHASH check fails, the decryption fails and no data is returned. The existing code would perform the decryption at the same time as the GHASH check, which current design offers no parallel performance advantage. Performing GHASH fully before decryption prevents allocating output data and perform unneeded operations if the GHASH is failed. If GHASH is successful, in-place operations can be performed directly to the buffer without allocating an intermediary buffer and then copying that data.
      3) GCTR and GHASH allocating a fixed buffer size if the data size is over 1k when going into an intrinsic. At this time copying data from the bytebuffer into a byte array for the intrinsic to work on it is required. We cannot eliminate the copy, but we can reduce the size of the allocated buffer. There is little harm in creating a maximum size this buffer can be and copy data into that buffer repeatedly until it is finished. Having the maximum size at 4k does produce slightly faster top-end performance at times, but inconsistent results and an increase in memory usage from 7GB to 17GB have been inconclusive to increase the buffer size.
      4) Using bytebuffers allows for using duplicate() which lets the code easier chop up the data without unnecessary copying

      The GCM bytebuffer and logic changes produced a 16% increase in performance in the Flink test. This is limited to only GCM as the other algorithms still use bytebuffer-to-byte array copy method. Doing similar work on other algorithms would provide less of a performance gain because of the complexities of GCM and are have diminishing usage in TLS.

      Attachments

        Issue Links

          Activity

            People

              ascarpino Anthony Scarpino
              ascarpino Anthony Scarpino
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: