-
Enhancement
-
Resolution: Unresolved
-
P5
-
None
-
6
-
x86
-
linux
A DESCRIPTION OF THE REQUEST :
The implementation of Charset.forName() (as of JDK 1.6.0) caches the two most recently used charsets in memory, under the following assumption (comment from the source code):
// We expect most programs to use one Charset repeatedly.
// We convey a hint to this effect to the VM by putting the
// level 1 cache miss code in a separate method.
Programs that do not fall under this assumption require frequent lookups for the given charset.
This becomes problematic when the program uses non-standard charsets through the CharsetProvider framework. lookupViaProviders() performs an enumeration of ClassLoader resources every time it tries to lookup a charset, which results establishing URLConnections and reading JarFile's. Our particular servlet application uses non-standard charsets through the CharsetProvider framework and may use any number of them based on incoming requests. It ends up spending 25% of its cycles in the Charset.forName() method.
JUSTIFICATION :
The current Level1/Level2 caching behavior results in a detrimental performance hit for applications that do not conform to the assumptions about typical charset usage.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Providing a larger cache of Charsets, or the flexibility to configure the size of the cache (i.e. through a System property), would reduce the number of disk-bound lookups.
CUSTOMER SUBMITTED WORKAROUND :
To workaround this in our application, we wrap calls to Charset.forName() with a caching layer that manually caches the charsets that have been looked up. Other implementation details:
- we store SoftReferences to the Charsets to allow the memory to be reclaimed if necessary.
- we also store a LRU miss cache of unsupported charsets, since those also result in an expensive ClassLoader resource enumeration.
The implementation of Charset.forName() (as of JDK 1.6.0) caches the two most recently used charsets in memory, under the following assumption (comment from the source code):
// We expect most programs to use one Charset repeatedly.
// We convey a hint to this effect to the VM by putting the
// level 1 cache miss code in a separate method.
Programs that do not fall under this assumption require frequent lookups for the given charset.
This becomes problematic when the program uses non-standard charsets through the CharsetProvider framework. lookupViaProviders() performs an enumeration of ClassLoader resources every time it tries to lookup a charset, which results establishing URLConnections and reading JarFile's. Our particular servlet application uses non-standard charsets through the CharsetProvider framework and may use any number of them based on incoming requests. It ends up spending 25% of its cycles in the Charset.forName() method.
JUSTIFICATION :
The current Level1/Level2 caching behavior results in a detrimental performance hit for applications that do not conform to the assumptions about typical charset usage.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Providing a larger cache of Charsets, or the flexibility to configure the size of the cache (i.e. through a System property), would reduce the number of disk-bound lookups.
CUSTOMER SUBMITTED WORKAROUND :
To workaround this in our application, we wrap calls to Charset.forName() with a caching layer that manually caches the charsets that have been looked up. Other implementation details:
- we store SoftReferences to the Charsets to allow the memory to be reclaimed if necessary.
- we also store a LRU miss cache of unsupported charsets, since those also result in an expensive ClassLoader resource enumeration.