-
Enhancement
-
Resolution: Fixed
-
P4
-
11, 14
-
b19
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8273069 | 11.0.14-oracle | Evan Whelan | P4 | Resolved | Fixed | b02 |
JDK-8269338 | 11.0.13 | Dmitry Chuyko | P4 | Resolved | Fixed | b01 |
With the move from jdk8 to jdk9+ the internal representation of String's changed from a char[] to a byte[] plus int coder (LATIN1 or UTF16).
At the same time the StringCoding implementation changed to take advantage of this to optimize decoding of ASCII compatible charsets and also specific common charsets UTF_8, ISO_8859_1 & US_ASCII, enabling direct byte array copy optimizations.
However, at the same time the change of this implementation had an adverse performance effect for charset decoding where no ASCII fastpath (byte array copy) was possible, eg.most EBCDIC. The main reason being because now the internal representation was a coded byte[], after the decoder decoded to a char[], it then had to do a char[]->byte[] copy which didn't happen before.
This enhancement is to improve the performance of charset decoding when COMPACT_STRINGS is enabled by taking advantage of the fact that if a charset is "always compactable", ie.every mapping maps to a single <=0xff value, then the SingleByte.decode() can simply map straight to a LATIN1 byte[] rather than to a char[] (followed by a conversion to a LATIN1 byte[]).
Performance benchmarks show up to a 100% performance improvement for typical charset decoding for charsets that fall into this category, with no impact on other charsets.
At the same time the StringCoding implementation changed to take advantage of this to optimize decoding of ASCII compatible charsets and also specific common charsets UTF_8, ISO_8859_1 & US_ASCII, enabling direct byte array copy optimizations.
However, at the same time the change of this implementation had an adverse performance effect for charset decoding where no ASCII fastpath (byte array copy) was possible, eg.most EBCDIC. The main reason being because now the internal representation was a coded byte[], after the decoder decoded to a char[], it then had to do a char[]->byte[] copy which didn't happen before.
This enhancement is to improve the performance of charset decoding when COMPACT_STRINGS is enabled by taking advantage of the fact that if a charset is "always compactable", ie.every mapping maps to a single <=0xff value, then the SingleByte.decode() can simply map straight to a LATIN1 byte[] rather than to a char[] (followed by a conversion to a LATIN1 byte[]).
Performance benchmarks show up to a 100% performance improvement for typical charset decoding for charsets that fall into this category, with no impact on other charsets.
- backported by
-
JDK-8269338 Improve performance of charset decoding when charset is always compactable
- Resolved
-
JDK-8273069 Improve performance of charset decoding when charset is always compactable
- Resolved
- relates to
-
JDK-8054307 JEP 254: Compact Strings
- Closed