-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
6u10
-
Cause Known
-
x86
-
windows_xp
FULL PRODUCT VERSION :
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP SR-2
A DESCRIPTION OF THE PROBLEM :
This is a MIXED list of BUGs and RFEs regarding performance of sun.nio.cs.FastCharsetProvider !
Bug 1:
There is a shortcut to enhance VM startup in line 94:
if (cln.equals("US_ASCII")) {
cs = new US_ASCII();
...
As it seems that current charset for startup is UTF-8, the shortcut should be updated to UTF-8.
Also check this in sun.io.Converters, where ISO-8859-1 is yet defined for defaultEncoding.
Bug 2:
Method canonicalize() is invoked 2 times in sequence after invoking charsetForName() (lines 119 -> 82).
This unnecessarily consumes performance.
Bug 3:
Method lookup() should be synchronized to insure lock also for invokation from charsets() iteration; synchronization could be saved for method charsetForName().
RFE 4:
After unsuccessful lookup in cache, there is a 2nd lookup in classMap to check support for given charset name:
// Check cache first
Charset cs = cache.get(csn);
if (cs != null)
return cs;
// Do we even support this charset?
String cln = classMap.get(csn);
if (cln == null)
return null;
This could be enhanced by initializing the cache with a NULL Charset object. So there is only need for 1 single map lookup:
Charset cs = cache.get(csn);
if (cs == null) // Don't we even support this charset?
return null;
if (cs != NULL)
return cs;
so ...
RFE 5:
Class name lookup can be done after startup charset check, if latter is done by canonical name comparison:
if (csn.equals("us-ascii"))
RFE 6:
In method toLower() s shouldn't be looped twice after 'toLower' becomes false, and copying of lower chars could be saved:
(copying String's internal char[] shouldn't be slower than new char[], as it also needs to fill initial content with 0's)
private static String toLower(final String s) {
boolean allLower = true;
char[] ca = null;
for (int i=0; i<s.length(); i++) {
int c = s.charAt(i);
if (((c - 'A') | ('Z' - c)) >= 0) {
if (allLower) {
ca = s.toCharArray();
allLower = false;
}
ca[i] += '\u0020';
}
}
return allLower ? s : new String(ca);
}
RFE 7:
charsets() iterator must not canonicalize each charset name, so canonicalizing can be moved from lookup() to charsetForName().
See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=552&view=markup
RFE 8:
classMap could be saved. Instead pre-load the cache initially with the class names. This would save 1 HashMap lookup to get the class name of a charset to be instantiated:
In constructor do ...
cache = classNames; // type: Map<String,Object>
RFE 9:
Pre-load the VM's startup charset. This would save the check for each lookup:
private final static Charset STARTUP_CHARSET = new UTF_8();
cache.put(toLower(STARTUP_CHARSET.name()), STARTUP_CHARSET);
All this would result in very simple lookup:
private synchronized Charset lookup(String lowCanonical) {
// Check cache first
Object o = cache.get(lowCanonical); // TODO: maybe also cache by arbitrary alias
if (o instanceof String) {
// Instantiate new charset
Charset cs = newCharset((String)o);
// Cache it
if (cs != null)
cache.put(lowCanonical, cs);
return cs;
}
return (Charset)o;
}
Charset newCharset(String className) {
try {
Class c = Class.forName(packagePrefix+"."+className,
true, getClass().getClassLoader());
return (Charset)c.newInstance();
}
catch (ClassNotFoundException x) {}
catch (IllegalAccessException x) {}
catch (InstantiationException x) {}
return null;
}
See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=555&view=markup
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
For complete source code see:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=551&view=markup
---------- END SOURCE ----------
Comment copied from bugzilla report http://bugs.openjdk.java.net/show_bug.cgi?id=100092#c0
Description From UlfZibis 2009-07-19 12:12:28 PDT
Created an attachment (id=114) [details]
webrev including patch
Maybe following would be suitable for the mercurial changeset summary:
6790402 Speed-up FastCharsetProvider
Summary: Charset for startup should be UTF-8;
Invoke canonicalize() only once after invoking charsetForName();
Method lookup() should be synchronized;
Enhance lookup by initializing cache with a NULL Charset object;
Class name lookup can be done after startup charset check,
if is done by canonical name comparison;
Method toUpper() would be potentially faster than toLower();
charsets() iterator must not canonicalize each charset name;
classMap could be saved;
Pre-load the VM's startup charset;
Moving file standard-charsets to more appropriate location;
Prepare AbstractCharsetProvider, ExtendedCharsets for better
performance and flexibility;
Contributed-by: Ulf.Zibis at CoSoCo.de
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP SR-2
A DESCRIPTION OF THE PROBLEM :
This is a MIXED list of BUGs and RFEs regarding performance of sun.nio.cs.FastCharsetProvider !
Bug 1:
There is a shortcut to enhance VM startup in line 94:
if (cln.equals("US_ASCII")) {
cs = new US_ASCII();
...
As it seems that current charset for startup is UTF-8, the shortcut should be updated to UTF-8.
Also check this in sun.io.Converters, where ISO-8859-1 is yet defined for defaultEncoding.
Bug 2:
Method canonicalize() is invoked 2 times in sequence after invoking charsetForName() (lines 119 -> 82).
This unnecessarily consumes performance.
Bug 3:
Method lookup() should be synchronized to insure lock also for invokation from charsets() iteration; synchronization could be saved for method charsetForName().
RFE 4:
After unsuccessful lookup in cache, there is a 2nd lookup in classMap to check support for given charset name:
// Check cache first
Charset cs = cache.get(csn);
if (cs != null)
return cs;
// Do we even support this charset?
String cln = classMap.get(csn);
if (cln == null)
return null;
This could be enhanced by initializing the cache with a NULL Charset object. So there is only need for 1 single map lookup:
Charset cs = cache.get(csn);
if (cs == null) // Don't we even support this charset?
return null;
if (cs != NULL)
return cs;
so ...
RFE 5:
Class name lookup can be done after startup charset check, if latter is done by canonical name comparison:
if (csn.equals("us-ascii"))
RFE 6:
In method toLower() s shouldn't be looped twice after 'toLower' becomes false, and copying of lower chars could be saved:
(copying String's internal char[] shouldn't be slower than new char[], as it also needs to fill initial content with 0's)
private static String toLower(final String s) {
boolean allLower = true;
char[] ca = null;
for (int i=0; i<s.length(); i++) {
int c = s.charAt(i);
if (((c - 'A') | ('Z' - c)) >= 0) {
if (allLower) {
ca = s.toCharArray();
allLower = false;
}
ca[i] += '\u0020';
}
}
return allLower ? s : new String(ca);
}
RFE 7:
charsets() iterator must not canonicalize each charset name, so canonicalizing can be moved from lookup() to charsetForName().
See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=552&view=markup
RFE 8:
classMap could be saved. Instead pre-load the cache initially with the class names. This would save 1 HashMap lookup to get the class name of a charset to be instantiated:
In constructor do ...
cache = classNames; // type: Map<String,Object>
RFE 9:
Pre-load the VM's startup charset. This would save the check for each lookup:
private final static Charset STARTUP_CHARSET = new UTF_8();
cache.put(toLower(STARTUP_CHARSET.name()), STARTUP_CHARSET);
All this would result in very simple lookup:
private synchronized Charset lookup(String lowCanonical) {
// Check cache first
Object o = cache.get(lowCanonical); // TODO: maybe also cache by arbitrary alias
if (o instanceof String) {
// Instantiate new charset
Charset cs = newCharset((String)o);
// Cache it
if (cs != null)
cache.put(lowCanonical, cs);
return cs;
}
return (Charset)o;
}
Charset newCharset(String className) {
try {
Class c = Class.forName(packagePrefix+"."+className,
true, getClass().getClassLoader());
return (Charset)c.newInstance();
}
catch (ClassNotFoundException x) {}
catch (IllegalAccessException x) {}
catch (InstantiationException x) {}
return null;
}
See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=555&view=markup
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
For complete source code see:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=551&view=markup
---------- END SOURCE ----------
Comment copied from bugzilla report http://bugs.openjdk.java.net/show_bug.cgi?id=100092#c0
Description From UlfZibis 2009-07-19 12:12:28 PDT
Created an attachment (id=114) [details]
webrev including patch
Maybe following would be suitable for the mercurial changeset summary:
6790402 Speed-up FastCharsetProvider
Summary: Charset for startup should be UTF-8;
Invoke canonicalize() only once after invoking charsetForName();
Method lookup() should be synchronized;
Enhance lookup by initializing cache with a NULL Charset object;
Class name lookup can be done after startup charset check,
if is done by canonical name comparison;
Method toUpper() would be potentially faster than toLower();
charsets() iterator must not canonicalize each charset name;
classMap could be saved;
Pre-load the VM's startup charset;
Moving file standard-charsets to more appropriate location;
Prepare AbstractCharsetProvider, ExtendedCharsets for better
performance and flexibility;
Contributed-by: Ulf.Zibis at CoSoCo.de