Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8292573

ConditionalSpecialCasing.lookUpTable is wasting memory

XMLWordPrintable

    • generic
    • generic

      A DESCRIPTION OF THE PROBLEM :
      There are two issues with the method `lookUpTable` of the internal class java.lang.ConditionalSpecialCasing which is used for special case conversion:
      - It uses the int codepoint as key for a Map<Integer, ...> to look up the case conversion; therefore this wraps the int as an Integer
      - The special case conversion entries are stored in a HashSet<Entry>
        - First of all usage of a Set seems redundant because Entry does not even override `equals` and is look like always distinct Entry instances are added to the Set
        - Usage of a Set means a new Iterator object is created whenever case conversion entries are found for a code point

      It looks like both of this can be fixed, for example in the following way:
      1. Remove ConditionalSpecialCasing.Entry.ch (and the corresponding getter)
      2. Remove the static field ConditionalSpecialCasing.entry
      3. For every existing entry add a static final field `entry<codepoint>` storing a Entry[]
      4. In ConditionalSpecialCasing.lookUpTable use a `switch` to access the corresponding `entry...`

      Here is a short example snippet showing that:
      ```
      private static final Entry[] entry0069 = {
          new Entry(new char[]{0x0069}, new char[]{0x0130}, "tr", 0), // # LATIN SMALL LETTER I
          new Entry(new char[]{0x0069}, new char[]{0x0130}, "az", 0) // # LATIN SMALL LETTER I
      };
      ...

      private static char[] lookUpTable(String src, int index, Locale locale, boolean bLowerCasing) {
          Entry[] entries = switch (src.codePointAt(index)) {
              case 0x0069 -> entry0069;
              ...
              default -> null;
          };
          char[] ret = null;

          if (entries != null) {
              String currentLang = locale.getLanguage();
              for (Entry entry : entries) {
                  String conditionLang = entry.getLanguage();
                  ...
              }
          }

          return ret;
      }
      ```


      Note: `java.lang.ConditionalSpecialCasing.isFinalCased` is also quite problematic because it creates a new StringCharacterIterator and a RuleBasedBreakIterator for each call.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Profile the object allocations of the `toLowerCase` calls of the following code snippets, for example with VisualVM:

      1. Snippet:
      ```
      String s = "\u0130".repeat(1000);
      s.toLowerCase(Locale.ROOT);
      ```

      2. Snippet:
      ```
      String s = "\u03A3".repeat(1000);
      s.toLowerCase(Locale.ROOT);
      ```


      ACTUAL -
      1. Snippet:
      2000 Integer objects created
      2000 HashMap$KeyIterator objects created

      2. Snippet:
      1000 Integer objects created
      1000 HashMap$KeyIterator objects created
      1000 StringCharacterIterator objects created
      1000 RuleBasedBreakIterator objects created

        1. Profiling results.png
          93 kB
          Andrew Wang
        2. Test.java
          0.3 kB
          Andrew Wang

            redestad Claes Redestad
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: