Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: None
Affects Version/s: 17, 18
Component/s: core-libs
Labels:

Subcomponent:
java.lang
CPU:

generic
OS:

generic

A DESCRIPTION OF THE PROBLEM :
There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion:
- It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer
- The special case conversion entries are stored in a HashSet<Entry>
  - First of all usage of a Set seems redundant because Entry does not even override `equals` and is look like always distinct Entry instances are added to the Set
  - Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point

It looks like both of this can be fixed, for example in the following way:
1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
2. Remove the static field ConditionalSpecialCasing.entry
3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[]
4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...`

Here is a short example snippet showing that:
```
private static final Entry[] entry0069 = {
    new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I
    new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I
};
...

private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) {
    Entry[] entries = switch (src.codePointAt(index)) {
        case 0x0069 -> entry0069;
        ...
        default -> null;
    };
    char[] ret = null;

    if (entries != null) {
        String currentLang = locale.getLanguage();
        for (Entry entry : entries) {
            String conditionLang = entry.getLanguage();
            ...
        }
    }

    return ret;
}
```

Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM:

1. Snippet:
```
String s = "\u0130".repeat(1000);
s.toLowerCase(Locale.ROOT);
```

2. Snippet:
```
String s = "\u03A3".repeat(1000);
s.toLowerCase(Locale.ROOT);
```

ACTUAL -
1. Snippet:
2000 Integer objects created
2000 HashMap$KeyIterator objects created

2. Snippet:
1000 Integer objects created
1000 HashMap$KeyIterator objects created
1000 StringCharacterIterator objects created
1000 RuleBasedBreakIterator objects created

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Profiling results.png
93 kB
2022-08-12 04:08
Test.java
0.3 kB
2022-08-12 04:08

Assignee:: Claes Redestad

Reporter:: Webbug Group

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022-08-07 13:50

Updated:: 2023-02-13 05:33

Details

Description

Attachments

Attachments

Activity

People

Dates