The original bug report is at the bottom. The gist of this bug
is that JLS 20.12.36 reads:
"Otherwise, this method creates a new String object
representing a character sequence _identical in
length_ ..."
[emphasis mine].
But if you skim String.java, if the character is a "sharp s"
(\u00DF) the uppercase translation of that is the string
"ss". I am told Character.toUpperCase() returning a single
character for \u00DF is okay -- because when you do the
toUpperCase on a String, you are doing the translation in
terms of a particular locale (with Character.toUpperCase()
you are doing a search on the Unicode table -- a set of truths
for all locales).
So String.toUpperCase might return a bigger string. Note that
this is not a problem with toLowerCase.
anand.palaniswamy@Eng 1997-10-14
Original report:
Name: joT67522 Date: 08/27/97
There is a problem with String.toUpperCase which does not
always return a String of the same length as the original.
Due to a special case in String.java, the sharp s ('\u00DF')
will always be converted into "SS".
While this is not a bad idea (at least for Germany :-), there
are two problems with that approach:
1. The current documentation says "... a new string is allocated,
whose length is identical to this string, ..."
2. During the constructing of a filter Writer class which
performs upper case conversion, Strings containing at least
one sharp s have been truncated by mistake. This was due to
the fact that write(String, int, int) cuts the String which
increased in length after conversion. Unfortunately, write cannot
predict whether it is allowed to increase the length of the
resulting String as needed due to sharp S conversions or not.
company - Self-employed author , email - ###@###.###
======================================================================
is that JLS 20.12.36 reads:
"Otherwise, this method creates a new String object
representing a character sequence _identical in
length_ ..."
[emphasis mine].
But if you skim String.java, if the character is a "sharp s"
(\u00DF) the uppercase translation of that is the string
"ss". I am told Character.toUpperCase() returning a single
character for \u00DF is okay -- because when you do the
toUpperCase on a String, you are doing the translation in
terms of a particular locale (with Character.toUpperCase()
you are doing a search on the Unicode table -- a set of truths
for all locales).
So String.toUpperCase might return a bigger string. Note that
this is not a problem with toLowerCase.
anand.palaniswamy@Eng 1997-10-14
Original report:
Name: joT67522 Date: 08/27/97
There is a problem with String.toUpperCase which does not
always return a String of the same length as the original.
Due to a special case in String.java, the sharp s ('\u00DF')
will always be converted into "SS".
While this is not a bad idea (at least for Germany :-), there
are two problems with that approach:
1. The current documentation says "... a new string is allocated,
whose length is identical to this string, ..."
2. During the constructing of a filter Writer class which
performs upper case conversion, Strings containing at least
one sharp s have been truncated by mistake. This was due to
the fact that write(String, int, int) cuts the String which
increased in length after conversion. Unfortunately, write cannot
predict whether it is allowed to increase the length of the
resulting String as needed due to sharp S conversions or not.
company - Self-employed author , email - ###@###.###
======================================================================
- duplicates
-
JDK-4304573 RFE: Add case mappings for new characters in Unicode 3.0 spec
- Resolved
- relates to
-
JDK-4120540 string1.equalsIgnoreCase(string2) differs from string1.toUpperCase().equals(stri
- Closed