Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8292573

ConditionalSpecialCasing.lookUpTable is wasting memory

XMLWordPrintable

    • generic
    • generic

      A DESCRIPTION OF THE PROBLEM :
      There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion:
      - It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer
      - The special case conversion entries are stored in a HashSet<Entry>
        - First of all usage of a Set seems redundant because Entry does not even override `equals` and is look like always distinct Entry instances are added to the Set
        - Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point

      It looks like both of this can be fixed, for example in the following way:
      1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
      2. Remove the static field ConditionalSpecialCasing.entry
      3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[]
      4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...`

      Here is a short example snippet showing that:
      ```
      private static final Entry[] entry0069 = {
          new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I
          new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I
      };
      ...

      private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) {
          Entry[] entries = switch (src.codePointAt(index)) {
              case 0x0069 -> entry0069;
              ...
              default -> null;
          };
          char[] ret = null;

          if (entries != null) {
              String currentLang = locale.getLanguage();
              for (Entry entry : entries) {
                  String conditionLang = entry.getLanguage();
                  ...
              }
          }

          return ret;
      }
      ```


      Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM:

      1. Snippet:
      ```
      String s = "\u0130".repeat(1000);
      s.toLowerCase(Locale.ROOT);
      ```

      2. Snippet:
      ```
      String s = "\u03A3".repeat(1000);
      s.toLowerCase(Locale.ROOT);
      ```


      ACTUAL -
      1. Snippet:
      2000 Integer objects created
      2000 HashMap$KeyIterator objects created

      2. Snippet:
      1000 Integer objects created
      1000 HashMap$KeyIterator objects created
      1000 StringCharacterIterator objects created
      1000 RuleBasedBreakIterator objects created

            redestad Claes Redestad
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: