A DESCRIPTION OF THE REQUEST :
While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. Suggested changes:
Collator.compare(String, String) -> Collator.compare(CharSequence, CharSequence)
Suggested additions:
Collator.compare(int codepoint1, int codepoint2)
Character.toString(int codepoint)
JUSTIFICATION :
While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. See the attached source code for an example. Basically, Strings are immutable and the only comparison provided by the Collator is string based, rather than the more generic CharSequence. If the data you are processing is not stored as strings, then you are forced to allocate strings to do basic processing. Also, since there is no API for comparing single codepoints, doing processing like finding the max common prefix requires up to (# of codepoints in smaller sequence * 2) memory allocations.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Require fewer memory allocations when doing unicode processing of CharSequences.
ACTUAL -
For example, currently requires (# of codepoints in smaller sequence * 2) memory allocations to find maximum common prefix of 2 unicode CharSequences.
---------- BEGIN SOURCE ----------
private static int getLengthOfMaxCommonPrefix(CharSequence str1, CharSequence str2, Collator collator) {
if ((str1 == null) || (str2 == null)) { return 0; }
if (Character.codePointCount(str1, 0, str1.length()) > Character.codePointCount(str2, 0, str2.length())) {
CharSequence tmp = str1;
str1 = str2;
str2 = tmp;
}
// @todo get rid of memory allocation
char[] charArray = new char[4];
int i = 0;
for (int size = Character.codePointCount(str1, 0, str1.length()); i < size; i++) {
Character.toChars(Character.codePointAt(str1, i), charArray, 0);
Character.toChars(Character.codePointAt(str2, i), charArray, 2);
// @todo get rid of memory allocation
String char1Str = new String(charArray, 0, 2);
// @todo get rid of memory allocation
String char2Str = new String(charArray, 2, 2);
if (collator.compare(char1Str, char2Str) != 0) {
return i;
}
}
return i;
}
---------- END SOURCE ----------
While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. Suggested changes:
Collator.compare(String, String) -> Collator.compare(CharSequence, CharSequence)
Suggested additions:
Collator.compare(int codepoint1, int codepoint2)
Character.toString(int codepoint)
JUSTIFICATION :
While writing some code to get the maximum common prefix of two unicode CharSequences, it became apparent the the current API was not sufficient for an efficient implementation. See the attached source code for an example. Basically, Strings are immutable and the only comparison provided by the Collator is string based, rather than the more generic CharSequence. If the data you are processing is not stored as strings, then you are forced to allocate strings to do basic processing. Also, since there is no API for comparing single codepoints, doing processing like finding the max common prefix requires up to (# of codepoints in smaller sequence * 2) memory allocations.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Require fewer memory allocations when doing unicode processing of CharSequences.
ACTUAL -
For example, currently requires (# of codepoints in smaller sequence * 2) memory allocations to find maximum common prefix of 2 unicode CharSequences.
---------- BEGIN SOURCE ----------
private static int getLengthOfMaxCommonPrefix(CharSequence str1, CharSequence str2, Collator collator) {
if ((str1 == null) || (str2 == null)) { return 0; }
if (Character.codePointCount(str1, 0, str1.length()) > Character.codePointCount(str2, 0, str2.length())) {
CharSequence tmp = str1;
str1 = str2;
str2 = tmp;
}
// @todo get rid of memory allocation
char[] charArray = new char[4];
int i = 0;
for (int size = Character.codePointCount(str1, 0, str1.length()); i < size; i++) {
Character.toChars(Character.codePointAt(str1, i), charArray, 0);
Character.toChars(Character.codePointAt(str2, i), charArray, 2);
// @todo get rid of memory allocation
String char1Str = new String(charArray, 0, 2);
// @todo get rid of memory allocation
String char2Str = new String(charArray, 2, 2);
if (collator.compare(char1Str, char2Str) != 0) {
return i;
}
}
return i;
}
---------- END SOURCE ----------
- relates to
-
JDK-8035473 [javadoc] Revamp the existing Doclet APIs
- Closed
-
JDK-8137326 Methods for comparing CharSequence, StringBuilder, and StringBuffer
- Resolved