-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
None
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
When the String(byte[] bytes, int offset, int length, Charset charset) constructor is called with UTF-8, there is a check whether the byte array contains only ascii, and if it's not there is an attempt to decode it as a latin1 string.
That attempt kicks off by allocating a new array.
Instead, we can speculatively check if the first byte can be the starting byte for a latin1 encoded code point, and if it's not, skip the array creation and move straight to the decodeUTF8_UTF16 call.
This will be a tiny slow down for strings that start with a latin1 code point, but will give a relatively large performance boost if the string starts with a non latin1 code point.
This seems like a good tradeoff, considering that there is a fair chance a non ascii string starts with a non latin1 character.
When the String(byte[] bytes, int offset, int length, Charset charset) constructor is called with UTF-8, there is a check whether the byte array contains only ascii, and if it's not there is an attempt to decode it as a latin1 string.
That attempt kicks off by allocating a new array.
Instead, we can speculatively check if the first byte can be the starting byte for a latin1 encoded code point, and if it's not, skip the array creation and move straight to the decodeUTF8_UTF16 call.
This will be a tiny slow down for strings that start with a latin1 code point, but will give a relatively large performance boost if the string starts with a non latin1 code point.
This seems like a good tradeoff, considering that there is a fair chance a non ascii string starts with a non latin1 character.