Summary
Date/Time names with supplementary characters cannot be parsed in a case-insensitive manner.
Problem
JDK15 added a new locale "ff-Adlm-LR
", which has locale data, such as month/day names in Adlam script, which is encoded in a supplementary character plane. java.text.DateFormat
parses those names in a case-insensitive manner, but it throws an exception because underlying String.regionMatches(ignoreCase == true)
fails for supplementary characters, such that:
"\ud83a\udd2e".regionMatches(true, 0, "\ud83a\udd0c", 0, 2)
Returns false
. where:
"\ud83a\udd2e" == 'ADLAM SMALL LETTER O' (U+1E92E)
"\ud83a\udd0c" == 'ADLAM CAPITAL LETTER O' (U+1E90C)
despite that:
"\ud83a\udd2e".toUpperCase(Locale.ROOT).equals("\ud83a\udd0c")
Character.toUpperCase(0x1e92e) == 0x1e90c
each statement returns true
.
Solution
Change those specs for String.regionMatches(boolean,...)
, String.equalsIgnoreCase()
, and String.compareToIgnoreCase()
to perform "code point" comparison in case for supplementary characters. Characters in Basic Multilingual Plane (<= \uFFFF
) are continued to be compared with code units got from charAt()
method.
Although this change will alter the semantics in traversing the string to compare, the rationale to change it is that these String methods should consistently behave across characters (code points) whether they are in Basic Multilingual Plane or not. There should be no reason to exclude supplementary characters from comparing strings in a case-insensitive manner.
Specification
Append the following sentence just after the last list item of conditions in the method description of String.regionMatches(boolean, ...)
method.
* In case that both <i>toffset+k</i> and <i>ooffset+k</i> point to
* supplementary characters, that is <i>k</i> point to high surrogates
* and <i>k+1</i> point to low surrogates, {@code codePointAt()} is
* used to retrieve the code points in place for {@code charAt()} method,
* and <i>k+1</i> is excluded from the above condition. If they point
* to an unpaired high or low surrogates, they are compared using
* {@code charAt()} method.
Change the following list item of conditions in the method description of String.equalsIgnoreCase()
method from:
* <li> Calling {@code Character.toLowerCase(Character.toUpperCase(char))}
* on each character produces the same result
to:
* <li> Calling {@code Character.toLowerCase(Character.toUpperCase(int))}
* on each code point produces the same result
Change the following description in the method description of String.compareToIgnoreCase()
method from:
* {@code Character.toLowerCase(Character.toUpperCase(character))} on
* each character.
to:
* {@code Character.toLowerCase(Character.toUpperCase(int))} on
* each code point of the character.
- csr of
-
JDK-8248434 some newly added locale cannot parse uppercased date string.
-
- Closed
-