Details
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8027426 | 7u60 | Yuka Kamiya | P3 | Closed | Fixed | b01 |
Description
FULL PRODUCT VERSION :
java version " 1.7.0_25 "
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
The problem does not happen when the test is run in Turkish locale.
In order to reproduce the problem, the locale should be set to English (or probably any non-Turkish locale)
In English locale, if a string with dotted-capital-I (Turkish-I, \u0130) character is converted to lower case, using toLoweCase method, an extra (and invalid) character is added to the resulting string just after the Turkish-I character.
REGRESSION. Last worked in version 6u45
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String stringWithDottedI = " \u0130 " ;
Locale.setDefault(new Locale( " en " , " US " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
lowerCasedString.length() == 1
ACTUAL -
lowerCasedString.length() == 2
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package test;
import org.junit.Before;
import org.junit.Test;
import java.util.Locale;
import static org.junit.Assert.assertEquals;
public class StringTest {
private final String stringWithDottedI = " \u0130 " ;
@Before
public void setup() {
}
@Test
public void testWhenLocaleIsTurkish_lowerCasedStringShouldHaveSameLength() {
Locale.setDefault(new Locale( " tr " , " TR " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
}
@Test
public void testWhenLocaleIsEnglish_lowerCasedStringShouldHaveSameLength() {
Locale.setDefault(new Locale( " en " , " US " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
set default locale to Turkish ( new Locale( " tr " , " TR " ) )
or call toLowerCase method which accepts a locale parameter and pass a Turkish locale parameter
java version " 1.7.0_25 "
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
The problem does not happen when the test is run in Turkish locale.
In order to reproduce the problem, the locale should be set to English (or probably any non-Turkish locale)
In English locale, if a string with dotted-capital-I (Turkish-I, \u0130) character is converted to lower case, using toLoweCase method, an extra (and invalid) character is added to the resulting string just after the Turkish-I character.
REGRESSION. Last worked in version 6u45
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String stringWithDottedI = " \u0130 " ;
Locale.setDefault(new Locale( " en " , " US " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
lowerCasedString.length() == 1
ACTUAL -
lowerCasedString.length() == 2
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package test;
import org.junit.Before;
import org.junit.Test;
import java.util.Locale;
import static org.junit.Assert.assertEquals;
public class StringTest {
private final String stringWithDottedI = " \u0130 " ;
@Before
public void setup() {
}
@Test
public void testWhenLocaleIsTurkish_lowerCasedStringShouldHaveSameLength() {
Locale.setDefault(new Locale( " tr " , " TR " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
}
@Test
public void testWhenLocaleIsEnglish_lowerCasedStringShouldHaveSameLength() {
Locale.setDefault(new Locale( " en " , " US " ));
String lowerCasedString = stringWithDottedI.toLowerCase();
assertEquals(stringWithDottedI.length(), lowerCasedString.length());
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
set default locale to Turkish ( new Locale( " tr " , " TR " ) )
or call toLowerCase method which accepts a locale parameter and pass a Turkish locale parameter
Attachments
Issue Links
- backported by
-
JDK-8027426 String.toLowerCase incorrectly increases length, if string contains \u0130 char
- Closed
- relates to
-
JDK-8041791 String.toLowerCase regression - violates Unicode standard
- Closed
-
JDK-6404304 RFE: Unicode 5.1 support
- Closed