-
Bug
-
Resolution: Fixed
-
P4
-
1.3.0, 1.4.1
-
beta2
-
x86
-
linux, windows_98, windows_2000
Name: rmT116609 Date: 01/16/2003
FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
and:
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
FULL OPERATING SYSTEM VERSION :
Windows 2000 Version 5.0 (Build 2195: Service Pack 3)
A DESCRIPTION OF THE PROBLEM :
When using the Swedish Locale, the letter combination AA is not sorted correctly. It should be sorted without any special rules, as it is in the Finnish Locale.
(I believe that this bug is inspired by the Danish/Norwegian method of using AA as a substitute for A_WITH_RING. They didn't introduce A_WITH_RING into their
alphabets until the 20:th century. Swedes/Finns, however, have never used AA instead of A_WITH_RING).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile and run the supplied program.
2. Examine the output.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Notice that "aardvark" will be sorted last, instead of
first. This is certainly not correct for Swedish.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;
/**********************************************************
* This program demonstrates that a special sorting rule is
* applied to the letter combination AA, when using the
* Swedish Locale ("sv", "SE").
***********************************************************/
public class CollatorTest {
/********************************************************
*********************************************************/
public static void main (String[] args) {
Locale loc = new Locale ("sv", "SE"); // Swedish
Locale.setDefault (loc);
Collator col = Collator.getInstance ();
String[] data = {"aardvark",
"antilope",
"baboon",
"crocodile"};
Arrays.sort (data, col);
System.out.println ("Using " + loc.getDisplayName());
for (int i = 0; i < data.length; i++) {
System.out.println (data[i]);
}//end for
}//end main
}//end class CollatorTest
---------- END SOURCE ----------
CUSTOMER WORKAROUND :
When sorting, in the Swedish Locale, use a Finnish Collator
public Collator getCollator () {
if (Locale.getDefault().getLanguage().equals("sv")) {
return Collator.getInstance(new Locale("fi", "FI"));
}//end if
return Collator.getInstance();
}//end getCollator
(Review ID: 179153)
======================================================================
Name: rl43681 Date: 03/05/2003
FULL PRODUCT VERSION :
On the machine with 2.2.19 kernel:
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
On the 2.4.20 kernel machine:
java version "1.4.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)
FULL OS VERSION :
Tested on these platforms:
Linux l82 2.4.20 #1 sön dec 8 00:17:15 CET 2002 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux
Linux xxxx.unit.liu.se 2.2.19-6.2.15smp #1 SMP Wed Feb 27 10:44:30 EST 2002 i686 unknown
A DESCRIPTION OF THE PROBLEM :
When asking java to sort text for the Swedish locale the tokens 'aa', 'aA', 'Aa' and 'AA' are collated together with 'Å' (A-ring) witch is not correct.
Also 'Æ' ae ligature, the Norwegian/Danish letter representing the same sound as 'Ä' a-umlaut is not colleted correctly.
The problem seems to be CollationElements in LocaleElements_sv.java.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See source code, I didn't bother to use the \u notation except for the rules that I hope will go back into LocaleElements_sv.java, hopefully that will not be a problem.
EXPECTED VERSUS ACTUAL BEHAVIOR :
I expected the collator to get it right with the default locale as well, not just the one I patched.
default
a
A
ae
Ae
b
B
y
Y
ü
Ü
z
Z
æ
Æ
å
Å
aa
Aa
ä
Ä
ö
Ö
ø
Ø
patched
a
A
aa
Aa
ae
Ae
b
B
y
Y
ü
Ü
z
Z
å
Å
ä
Ä
æ
Æ
ö
Ö
ø
Ø
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;
/**
* A test of Javas opinion on the Swedish alphabet.
*/
public class CollationBugDemo {
String name;
/** This should be a Collator that knows how to order AA, Å, Ä, Ö
* & c. for Swedish. */
Collator swedishCollator;
/** This will be a set of sorted strings eventually. */
SortedSet resultSet;
public CollationBugDemo(String name, Collator collator) {
this.name = name;
resultSet = new TreeSet();
swedishCollator = collator;
}
public void add(String element) {
resultSet.add(swedishCollator.getCollationKey(element));
}
public void printElements() {
System.out.println(name);
Iterator elements = resultSet.iterator();
while (elements.hasNext()) {
String element =
(String)((CollationKey)elements.next()).getSourceString();
System.out.println(element);
}
System.out.println();
}
public static void exerciseDemo(CollationBugDemo demo) {
demo.add("A");
demo.add("Aa");
demo.add("Ae");
demo.add("B");
demo.add("Y");
demo.add("Ü"); // U-umlaut
demo.add("Z");
demo.add("Å"); // A-ring
demo.add("Ä"); // A-umlaut
demo.add("Æ"); // AE ligature
demo.add("Ö"); // O-umlaut
demo.add("Ø"); // O-stroke
demo.add("a");
demo.add("aa");
demo.add("ae");
demo.add("b");
demo.add("y");
demo.add("ü"); // u-umlaut
demo.add("z");
demo.add("å"); // a-ring
demo.add("ä"); // a-umlaut
demo.add("æ"); // ae ligature
demo.add("ö"); // o-umlaut
demo.add("ø"); // o-stroke
demo.printElements();
}
public static void main(String[] args) throws Exception {
CollationBugDemo demo;
Locale swedishLocale = new Locale("sv", "SE");
Collator defaultCollator = Collator.getInstance(swedishLocale);
demo = new CollationBugDemo("default", defaultCollator);
exerciseDemo(demo);
String defaultRules =
((RuleBasedCollator)defaultCollator).getRules();
int beginningOfSpecificRules =
defaultRules.indexOf("& Z");
String genericCollationRules =
defaultRules.substring(0, beginningOfSpecificRules);
String patchedRules =
// (I'm a bit torn between either "tricking" people into
// using the double-acute variants witch are not used in
// Swedish or making someone nearly as upset about the
// collating as I got to submit this.)
genericCollationRules +
"< a\u030a , A\u030a " + // a-ring
"< a\u0308 , A\u0308 " + // a-umlaut
// Someone writing a-double-acute, o-double-acute or
// u-double-acute is most likely to expect them collated
// along with the respective umlauts.
"; a\u030b , A\u030b " + // a-double-acute
// The same applies to the ae ligature witch is the
// Norwegian and Danish representation of the same sound
// as a-umlaut in Swedish.
"; \u00e6 , \u00c6 " + // ae ligature
"< o\u0308 , O\u0308 " + // o-umlaut
"; o\u030b , O\u030b " + // o-double-acute
// And o-stroke is the Norwegian and Danish representation
// of the same sound as o-umlaut in Swedish.
"; \u00f8 , \u00d8 " + // o-stroke
"& V ; w , W" +
"& Y, u\u0308 , U\u0308" + // u-umlaut
"; u\u030b , U\u030b "; // u-double-acute
Collator patchedCollator = new RuleBasedCollator(
patchedRules);
demo = new CollationBugDemo("patched", patchedCollator);
exerciseDemo(demo);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Se source code.
(Review ID: 182172)
======================================================================
- duplicates
-
JDK-4348571 Bug in class Collator with swedish locale
-
- Closed
-