Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6790402

Speed-up FastCharsetProvider

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 6u10
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.6.0_11"
      Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
      Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Windows XP SR-2

      A DESCRIPTION OF THE PROBLEM :
      This is a MIXED list of BUGs and RFEs regarding performance of sun.nio.cs.FastCharsetProvider !

        Bug 1:
      There is a shortcut to enhance VM startup in line 94:
      if (cln.equals("US_ASCII")) {
      cs = new US_ASCII();
                  ...
      As it seems that current charset for startup is UTF-8, the shortcut should be updated to UTF-8.
      Also check this in sun.io.Converters, where ISO-8859-1 is yet defined for defaultEncoding.

        Bug 2:
      Method canonicalize() is invoked 2 times in sequence after invoking charsetForName() (lines 119 -> 82).
      This unnecessarily consumes performance.

        Bug 3:
      Method lookup() should be synchronized to insure lock also for invokation from charsets() iteration; synchronization could be saved for method charsetForName().

      RFE 4:
      After unsuccessful lookup in cache, there is a 2nd lookup in classMap to check support for given charset name:
      // Check cache first
      Charset cs = cache.get(csn);
      if (cs != null)
      return cs;
      // Do we even support this charset?
      String cln = classMap.get(csn);
      if (cln == null)
      return null;
      This could be enhanced by initializing the cache with a NULL Charset object. So there is only need for 1 single map lookup:
              Charset cs = cache.get(csn);
              if (cs == null) // Don't we even support this charset?
                  return null;
              if (cs != NULL)
                  return cs;
      so ...

      RFE 5:
      Class name lookup can be done after startup charset check, if latter is done by canonical name comparison:
              if (csn.equals("us-ascii"))

      RFE 6:
      In method toLower() s shouldn't be looped twice after 'toLower' becomes false, and copying of lower chars could be saved:
      (copying String's internal char[] shouldn't be slower than new char[], as it also needs to fill initial content with 0's)
          private static String toLower(final String s) {
              boolean allLower = true;
              char[] ca = null;
              for (int i=0; i<s.length(); i++) {
                  int c = s.charAt(i);
                  if (((c - 'A') | ('Z' - c)) >= 0) {
                      if (allLower) {
                          ca = s.toCharArray();
                          allLower = false;
                      }
                      ca[i] += '\u0020';
                  }
              }
              return allLower ? s : new String(ca);
          }


      RFE 7:
      charsets() iterator must not canonicalize each charset name, so canonicalizing can be moved from lookup() to charsetForName().

      See my correction here:
      https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=552&view=markup


      RFE 8:
      classMap could be saved. Instead pre-load the cache initially with the class names. This would save 1 HashMap lookup to get the class name of a charset to be instantiated:
          In constructor do ...
              cache = classNames; // type: Map<String,Object>

      RFE 9:
      Pre-load the VM's startup charset. This would save the check for each lookup:
              private final static Charset STARTUP_CHARSET = new UTF_8();
              cache.put(toLower(STARTUP_CHARSET.name()), STARTUP_CHARSET);

      All this would result in very simple lookup:

          private synchronized Charset lookup(String lowCanonical) {
              // Check cache first
              Object o = cache.get(lowCanonical); // TODO: maybe also cache by arbitrary alias
              if (o instanceof String) {
                  // Instantiate new charset
                  Charset cs = newCharset((String)o);
                  // Cache it
                  if (cs != null)
                      cache.put(lowCanonical, cs);
                  return cs;
              }
              return (Charset)o;
          }
          Charset newCharset(String className) {
              try {
                  Class c = Class.forName(packagePrefix+"."+className,
                                          true, getClass().getClassLoader());
                  return (Charset)c.newInstance();
              }
              catch (ClassNotFoundException x) {}
              catch (IllegalAccessException x) {}
              catch (InstantiationException x) {}
              return null;
          }

      See my correction here:
      https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=555&view=markup


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      For complete source code see:
      https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=551&view=markup
      ---------- END SOURCE ----------
      Comment copied from bugzilla report http://bugs.openjdk.java.net/show_bug.cgi?id=100092#c0
      Description From UlfZibis 2009-07-19 12:12:28 PDT

      Created an attachment (id=114) [details]
      webrev including patch

      Maybe following would be suitable for the mercurial changeset summary:

      6790402 Speed-up FastCharsetProvider
      Summary: Charset for startup should be UTF-8;
               Invoke canonicalize() only once after invoking charsetForName();
               Method lookup() should be synchronized;
               Enhance lookup by initializing cache with a NULL Charset object;
               Class name lookup can be done after startup charset check,
                   if is done by canonical name comparison;
               Method toUpper() would be potentially faster than toLower();
               charsets() iterator must not canonicalize each charset name;
               classMap could be saved;
               Pre-load the VM's startup charset;
               Moving file standard-charsets to more appropriate location;
               Prepare AbstractCharsetProvider, ExtendedCharsets for better
                   performance and flexibility;
      Contributed-by: Ulf.Zibis at CoSoCo.de

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: