Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 9
Affects Version/s: 8
Component/s: core-libs
Labels:

Subcomponent:
java.lang
Resolved In Build:
b14
CPU:

generic
OS:

generic
Verification:
Verified

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8045661	8u25	Naoto Sato	P3	Resolved	Fixed	b01
JDK-8042814	8u20	Naoto Sato	P3	Resolved	Fixed	b17
JDK-8052712	emb-8u26	Naoto Sato	P3	Resolved	Fixed	b17
JDK-8072250	7u85	Naoto Sato	P3	Resolved	Fixed	b01
JDK-8043910	7u80	Naoto Sato	P3	Resolved	Fixed	b03
JDK-8065284	7u79	Naoto Sato	P3	Resolved	Fixed	b01
JDK-8065141	7u76	Naoto Sato	P3	Closed	Fixed	b09

The change ~~JDK-8020037~~ "String.toLowerCase incorrectly increases length, if string contains \u0130 char" seems to be wrong, according to my reading of the Unicode standard.

The text "String.toLowerCase incorrectly increases length" makes the assumption that this is a problem, but of course it isn't: The documentation specifically says "Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String."

I look at http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt and see:

# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

My understanding of this is that in all locales *except* the ones handled specially (which are 'az', 'lt', and 'tr') we should bi-directionally convert "\u0130" <-> "\u0069\u0307".
I.e. lowercasing "\u0130" should result in "\u0069\u0307";
converting "\u0069\u0307" to uppercase or titlecase should yield "\u0130".

Note this allows round-trip conversions, which is why it is specified this way.

Java 7 correctly does the former conversion, but not the latter.
Java 8 does neither.

backported by

JDK-8042814 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8043910 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8045661 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8052712 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8065284 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8072250 String.toLowerCase regression - violates Unicode standard

Resolved

JDK-8065141 String.toLowerCase regression - violates Unicode standard

Closed

blocks

JDK-8030201 Nashorn: String.prototype.toLowerCase() requires SpecialCasing support

Closed

duplicates

JDK-8041387 Applets not working when the preffered language is Turkish

Closed

relates to

JDK-8043186 javac test langtools/tools/javac/util/StringUtilsTest.java fails

Closed

JDK-8049038 In turkish locale, String.equalsIgnoreCase() returns "true" for character \u0130 and \u0131.

Closed

JDK-6404304 RFE: Unicode 5.1 support

Closed

JDK-8020037 String.toLowerCase incorrectly increases length, if string contains \u0130 char

Closed

(2 backported by, 1 blocks, 1 duplicates, 4 relates to)

Assignee:: Naoto Sato

Reporter:: Per Bothner (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2014-04-24 17:25

Updated:: 2022-08-08 12:04

Resolved:: 2014-05-14 10:56

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates