The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*Integer.MAX_VALUE. With compact strings this reduces to 2*Integer.MAX_VALUE. The low-level UTF8/UNICODE API should define UTF8 lengths as size_t to accommodate all possible representations. Higher-level API's can still use int if they know the strings (eg symbols) are sufficiently constrained in length.
- relates to
-
JDK-8339316 Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257
-
- Resolved
-
- links to
-
Commit(master) openjdk/jdk/a4962ace
-
Review(master) openjdk/jdk/20560