Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6664636

[Col] API improvements to minimize memory allocations during unicode processing

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 6
    • core-libs
    • x86
    • solaris_8

      A DESCRIPTION OF THE REQUEST :
      While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. Suggested changes:

      Collator.compare(String, String) -> Collator.compare(CharSequence, CharSequence)

        Suggested additions:

      Collator.compare(int codepoint1, int codepoint2)
      Character.toString(int codepoint)

      JUSTIFICATION :
      While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. See the attached source code for an example. Basically, Strings are immutable and the only comparison provided by the Collator is string based, rather than the more generic CharSequence. If the data you are processing is not stored as strings, then you are forced to allocate strings to do basic processing. Also, since there is no API for comparing single codepoints, doing processing like finding the max common prefix requires up to (# of codepoints in smaller sequence * 2) memory allocations.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Require fewer memory allocations when doing unicode processing of CharSequences.
      ACTUAL -
      For example, currently requires (# of codepoints in smaller sequence * 2) memory allocations to find maximum common prefix of 2 unicode CharSequences.

      ---------- BEGIN SOURCE ----------
      private static int getLengthOfMaxCommonPrefix(CharSequence str1, CharSequence str2, Collator collator) {
          if ((str1 == null) || (str2 == null)) { return 0; }
          if (Character.codePointCount(str1, 0, str1.length()) > Character.codePointCount(str2, 0, str2.length())) {
            CharSequence tmp = str1;
            str1 = str2;
            str2 = tmp;
          }
          // @todo get rid of memory allocation
          char[] charArray = new char[4];
          int i = 0;
          for (int size = Character.codePointCount(str1, 0, str1.length()); i < size; i++) {
            Character.toChars(Character.codePointAt(str1, i), charArray, 0);
            Character.toChars(Character.codePointAt(str2, i), charArray, 2);
             // @todo get rid of memory allocation
            String char1Str = new String(charArray, 0, 2);
            // @todo get rid of memory allocation
            String char2Str = new String(charArray, 2, 2);
            if (collator.compare(char1Str, char2Str) != 0) {
              return i;
            }
          }
          return i;
        }

      ---------- END SOURCE ----------

            naoto Naoto Sato
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: