Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8280358

Optimize UTF-8 decoding of non latin1 strings

XMLWordPrintable

    • generic
    • generic

      A DESCRIPTION OF THE PROBLEM :
      When the String(byte[] bytes, int offset, int length, Charset charset) constructor is called with UTF-8, there is a check whether the byte array contains only ascii, and if it's not there is an attempt to decode it as a latin1 string.
      That attempt kicks off by allocating a new array.
      Instead, we can speculatively check if the first byte can be the starting byte for a latin1 encoded code point, and if it's not, skip the array creation and move straight to the decodeUTF8_UTF16 call.
      This will be a tiny slow down for strings that start with a latin1 code point, but will give a relatively large performance boost if the string starts with a non latin1 code point.
      This seems like a good tradeoff, considering that there is a fair chance a non ascii string starts with a non latin1 character.


            Unassigned Unassigned
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: