Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6717772

(cs) Charset.forName should cache more charsets

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P5 P5
    • None
    • 6
    • core-libs

      A DESCRIPTION OF THE REQUEST :
      The implementation of Charset.forName() (as of JDK 1.6.0) caches the two most recently used charsets in memory, under the following assumption (comment from the source code):

              // We expect most programs to use one Charset repeatedly.
              // We convey a hint to this effect to the VM by putting the
              // level 1 cache miss code in a separate method.

        Programs that do not fall under this assumption require frequent lookups for the given charset.

      This becomes problematic when the program uses non-standard charsets through the CharsetProvider framework. lookupViaProviders() performs an enumeration of ClassLoader resources every time it tries to lookup a charset, which results establishing URLConnections and reading JarFile's. Our particular servlet application uses non-standard charsets through the CharsetProvider framework and may use any number of them based on incoming requests. It ends up spending 25% of its cycles in the Charset.forName() method.

      JUSTIFICATION :
      The current Level1/Level2 caching behavior results in a detrimental performance hit for applications that do not conform to the assumptions about typical charset usage.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Providing a larger cache of Charsets, or the flexibility to configure the size of the cache (i.e. through a System property), would reduce the number of disk-bound lookups.

      CUSTOMER SUBMITTED WORKAROUND :
        To workaround this in our application, we wrap calls to Charset.forName() with a caching layer that manually caches the charsets that have been looked up. Other implementation details:
      - we store SoftReferences to the Charsets to allow the memory to be reclaimed if necessary.
      - we also store a LRU miss cache of unsupported charsets, since those also result in an expensive ClassLoader resource enumeration.

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: