Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8261744

Implement CharsetDecoder ASCII and latin-1 fast-paths

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P3 P3
    • 17
    • None
    • core-libs

      Compact Strings work in JDK9 added intrinsic methods for the purposes of speeding up certain common String operations, including testing if a byte[] range has negative bytes, and inflating a byte[] to a char[]. By utilizing SIMD instructions, these intrinsic operations can execute much faster than a simple loop such as the one found in UTF_8.Decoder:

                  // ASCII only loop
                  while (dp < dlASCII && sa[sp] >= 0)
                      da[dp++] = (char) sa[sp++];

      However, the optimizations done by Compact Strings are not picked up in some places, such as in any of the built-in CharsetDecoders. On my Haswell-based workstation (AVX=2), converting a byte[] to a Strings via new String(new InputStreamReader(new ByteArrayInputStream(bytes)).read(chars)) can be 14x slower than doing new String(bytes) directly for UTF-8, and as much as 24x slower for US-ASCII. Obviously not something you'd do directly, but decoding character streams, e.g., when reading files or doing network I/O is a common occurrence.

      This RFE suggests reusing these intrinsics for latin1 and ASCII-compatible CharsetDecoders, which can experimentally give substantial improvements by utilizing SIMD capabilities. I see a 4-5x speed-up on UTF-8 and ~10x on ISO-8859-1, reducing the overhead of going via an InputStreamReader down to a more reasonable level (~2.5x) across all charsets.

            redestad Claes Redestad
            redestad Claes Redestad
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: