Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 6
Affects Version/s: 1.4.0, 6
Component/s: globalization
Labels:

Subcomponent:
translation
Resolved In Build:
b96
CPU:

x86
OS:

windows_2000, windows_xp

Name: nt126004 Date: 05/21/2002

FULL PRODUCT VERSION :
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

FULL OPERATING SYSTEM VERSION :
Microsoft Windows 2000 [Version 5.00.2195]

A DESCRIPTION OF THE PROBLEM :
Turkish has 2 unique letter pairs:
'\u0130' & 'i' ('İ' & 'i') which correspond to
English 'I', & 'i'
&
'I', & '\u0130' ('I' & 'ı') which don't exist as
letters in English and represent back-vowel pairs of
English 'I', & 'i'.

If you didn't get them above, you can check them out at:
http://www.prustinteractive.com/toolbox/font/

In other words, English I i are both with a dot in Turkish,
and the back-vowel versions of them are both dotless.

  From the API it appears that either:
langCollator.setStrength(Collator.PRIMARY)
or
langCollator.setStrength
(Collator.SECONDARY|Collator.CANONICAL_DECOMPOSITION);
or
langCollator.setStrength(Collator.SECONDARY);

should be capturing the difference between the 2 pairs, but
none does.

All combinations of containing PRIMARY & SECONDARY fail to
distinguish between the dotfulls and the dotless. The only
thing that gets both of them to compare != 0 is TERTIARY or
(a logical | with) Collator.FULL_DECOMPOSITION. But the
moment i do that i am no longer able to ignore case.
Besides, the Collator still treats the 2 pairs as the same
letter and mingles, for example, the words starting with
any of them, when sorted.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Use the source code below
2. Compare results
3.

EXPECTED VERSUS ACTUAL BEHAVIOR :
In the source code below:
1) should be != 0
2) should be == 0

Actual:
1) == 0
it can be made != 0 with TERTIARY or FULL_DECOMPOSITION,
but then 2) becomes != 0
And 2 letter pairs are considered as 1 pair in sorting.

getRules() returns a string identical to that for US
Locale, which might be root of problem.

This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.text.*;
import java.util.*;

public class collate {
  public static void main(String args[])
  {
    Collator coll = Collator.getInstance(new Locale("tr", "TR"));
//workaround place

    coll.setStrength(Collator.TERTIARY);
    System.out.println(coll.compare("a","A"));//false
    coll.setStrength(Collator.SECONDARY);
    System.out.println(coll.compare("a","A"));//true

    coll.setStrength(Collator.SECONDARY);
    System.out.println(coll.compare("\u0131","i"));//1) should be != 0
    System.out.println(coll.compare("\u0130","i"));//2) should be == 0

    coll.setStrength(Collator.PRIMARY);
    System.out.println(coll.compare("a","\u00e0"));//true

    coll.setStrength(Collator.IDENTICAL);
    System.out.println(coll.compare("a","b"));//false

    CollationKey key1 = coll.getCollationKey("abc");
    CollationKey key2 = coll.getCollationKey("def");
    System.out.println(key1.compareTo(key2));//false
  }
}

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
The line indicated above as workaround place should be
replaced with:

RuleBasedCollator tr_Collator;
try {
  tr_Collator = new
     RuleBasedCollator
(""<a,A<b,B<c,C<?,?<d,D<e,E<f,F<g,G<\u011f,\u011e<?,?<h,H<?;
\u0131,I<i,\u0130;?<j,J" +

"<k,K<l,L<m,M<n,N<o,O<?,?<p,P<r,R<s,S<\u015f,\u015e<
?,?<t,T<u,U<?,?<v,V<y,Y<z,Z<'-'<' '<q,Q<w,W<x,X"");
} catch (ParseException ex) {
  ex.printStackTrace();
}
turkishCollator.setStrength
(Collator.SECONDARY|Collator.CANONICAL_DECOMPOSITION);//this
line is optional, as rule ensures letter-grade difference
/*
letters ?,?, ?, ?, ?,? are not part of Turkish alphabet,
but are ASCII correspondences, and are included with an
attempt to provide for their ordering as well under
CANONICAL_DECOMPOSITION. Letters q,Q, w,W, x,X are not part
of Turkish alphabet, so they follow Z.
Note: while spec says "All non-mentioned Unicode characters
are at the end of the collation order. ", my '?' characters
(included only for testing) got ranked at the end of a-
words, not after 'Z', or 'X'. That might be another bug,
but one that won't concern most users of Turkish version of
Collator.
*/
(Review ID: 146774)
======================================================================
###@###.### 10/14/04 00:39 GMT

relates to

JDK-6328620 Collation of characters non-existing in Turkish locale

Closed

Assignee:: Jiri Tusla (Inactive)

Reporter:: Nathanael Thompson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2002-05-21 11:07

Updated:: 2006-08-17 17:24

Resolved:: 2006-08-17 17:24

Imported:: 16/Sep/12 4:35 PM

Indexed:: 18/Jul/12 11:11 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates