Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8231717

Improve performance of charset decoding when charset is always compactable

XMLWordPrintable

    • b19

        With the move from jdk8 to jdk9+ the internal representation of String's changed from a char[] to a byte[] plus int coder (LATIN1 or UTF16).
        At the same time the StringCoding implementation changed to take advantage of this to optimize decoding of ASCII compatible charsets and also specific common charsets UTF_8, ISO_8859_1 & US_ASCII, enabling direct byte array copy optimizations.
        However, at the same time the change of this implementation had an adverse performance effect for charset decoding where no ASCII fastpath (byte array copy) was possible, eg.most EBCDIC. The main reason being because now the internal representation was a coded byte[], after the decoder decoded to a char[], it then had to do a char[]->byte[] copy which didn't happen before.

        This enhancement is to improve the performance of charset decoding when COMPACT_STRINGS is enabled by taking advantage of the fact that if a charset is "always compactable", ie.every mapping maps to a single <=0xff value, then the SingleByte.decode() can simply map straight to a LATIN1 byte[] rather than to a char[] (followed by a conversion to a LATIN1 byte[]).

        Performance benchmarks show up to a 100% performance improvement for typical charset decoding for charsets that fall into this category, with no impact on other charsets.
         

              aleonard Andrew Leonard
              aleonard Andrew Leonard
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: