Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4546576

[Col] Incorrect Collator sorting with spaces.

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 1.4.0, 6
    • core-libs
    • Fix Understood
    • generic, x86
    • generic, windows_2000

      Name: gm110360 Date: 12/04/2001


      java version "1.3.1"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1-b24)
      Java HotSpot(TM) Client VM (build 1.3.1-b24, mixed mode)

      I've written a simple application to sort the following strings under
      the "en", "GB" locale:

      "Attire", "A new dog", "A tire", "Anew dog", "Andrew", "And you".

      Without running the program, I'd sort these as:

      "A new dog", "A tire", "And you", "Andrew", "Anew dog", "Attire".

      The following method is used to sort the values:

      import java.text.*;
      import java.util.*;

      public class test {
      public static void main(String[] args) {
           String[] testData = {"Attire", "A new dog", "A tire", "Anew dog", "Andrew", "And you"};
           int i;
          
           System.out.println("Unsorted:");
           for(i=0;i<6;i++)
           System.out.println(testData[i]);
           Collator collator = Collator.getInstance(new Locale("en", "GB"));
           Arrays.sort(testData, collator);
           System.out.println("Sorted:");
           for(i=0;i<6;i++)
           System.out.println(testData[i]);
        }
      }
        
      ..... and when run, the order it gives is:

      "Andrew", "And you", "Anew dog", "A new dog", "A tire", "Attire".

      It seems as though the space is completely ignored in the sorting algorithm,
      placing "A new dog" between "Anew dog" and "Attire".
      (Review ID: 136731)
      ======================================================================
      ###@###.### 11/2/04 18:33 GMT
      Same problem reported by a CAP member:
      ======================================
       
      We want sort text with natural language sorting. That we use the class
      java.text.Collator. In the documenation you can read:
      "You use this class to build searching and sorting routines for natural
      language text."
       
      After some experiments its look like that it ignore spaces and other
      characters. This is independent of the strength. Self for IDENTICAL spaces
      are ignored. But spaces are part of the sort order in natural languages. I
      think this is a documenation problem or a big bug.
       
      Sample:
          String first = "ABEL PATRICIA";
          String second = "ABELN MICHAEL";
          Collator collator = Collator.getInstance(java.util.Locale.US);
          int result = collator.compare(first, second);
       
      The string "second" is smaller as string "first". In all languages that I
      know this is inverse.
       
      Is this a bug? Is this a documention error? Or I misunderstand it? I does
      not see any case where I can use this type of sorting. That I think it is a
      bug.

            naoto Naoto Sato
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: