-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
17, 18
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion:
- It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer
- The special case conversion entries are stored in a HashSet<Entry>
- First of all usage of a Set seems redundant because Entry does not even override `equals` and is look like always distinct Entry instances are added to the Set
- Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point
It looks like both of this can be fixed, for example in the following way:
1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
2. Remove the static field ConditionalSpecialCasing.entry
3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[]
4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...`
Here is a short example snippet showing that:
```
private static final Entry[] entry0069 = {
new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I
new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I
};
...
private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) {
Entry[] entries = switch (src.codePointAt(index)) {
case 0x0069 -> entry0069;
...
default -> null;
};
char[] ret = null;
if (entries != null) {
String currentLang = locale.getLanguage();
for (Entry entry : entries) {
String conditionLang = entry.getLanguage();
...
}
}
return ret;
}
```
Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM:
1. Snippet:
```
String s = "\u0130".repeat(1000);
s.toLowerCase(Locale.ROOT);
```
2. Snippet:
```
String s = "\u03A3".repeat(1000);
s.toLowerCase(Locale.ROOT);
```
ACTUAL -
1. Snippet:
2000 Integer objects created
2000 HashMap$KeyIterator objects created
2. Snippet:
1000 Integer objects created
1000 HashMap$KeyIterator objects created
1000 StringCharacterIterator objects created
1000 RuleBasedBreakIterator objects created
There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion:
- It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer
- The special case conversion entries are stored in a HashSet<Entry>
- First of all usage of a Set seems redundant because Entry does not even override `equals` and is look like always distinct Entry instances are added to the Set
- Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point
It looks like both of this can be fixed, for example in the following way:
1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
2. Remove the static field ConditionalSpecialCasing.entry
3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[]
4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...`
Here is a short example snippet showing that:
```
private static final Entry[] entry0069 = {
new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I
new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I
};
...
private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) {
Entry[] entries = switch (src.codePointAt(index)) {
case 0x0069 -> entry0069;
...
default -> null;
};
char[] ret = null;
if (entries != null) {
String currentLang = locale.getLanguage();
for (Entry entry : entries) {
String conditionLang = entry.getLanguage();
...
}
}
return ret;
}
```
Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM:
1. Snippet:
```
String s = "\u0130".repeat(1000);
s.toLowerCase(Locale.ROOT);
```
2. Snippet:
```
String s = "\u03A3".repeat(1000);
s.toLowerCase(Locale.ROOT);
```
ACTUAL -
1. Snippet:
2000 Integer objects created
2000 HashMap$KeyIterator objects created
2. Snippet:
1000 Integer objects created
1000 HashMap$KeyIterator objects created
1000 StringCharacterIterator objects created
1000 RuleBasedBreakIterator objects created