A DESCRIPTION OF THE REQUEST :
String#getBytes(..) and new String(bytes..) internally use slow and each time newly instatiated Charset-X-coders.
Additionally:
At first assumption user could think, that String#getBytes(byte[] buf, Charset cs) might be faster than String#getBytes(byte[] buf, String csn), because he assumes, that Charset would be internally created from csn.
As this is only true for the first call, there should be a *note* in JavaDoc about cost of those methods in comparision. Don't forget (byte[] ...) constructor's JavaDoc too.
JUSTIFICATION :
Assumed that ASCII and ISO-8859-1 have high percentage in usage on those methods especially for CORBA applications, we should have a fast shortcut in class String.
See also:
http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636319
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636323
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Fastpath for ASCII + ISO-8859-1 for methods and constructors like:
String#getBytes(..) and new String(bytes..)
Alternatives:
String#getASCIIBytes(..)
String#getISO8859_1Bytes(..)
ACTUAL -
byte[] getBytes(Charset charset)
internally instantiates CharsetEncoder which is much slower, especially on short strings.
---------- BEGIN SOURCE ----------
1 simple example:
public class String {
...
int getBytes(byte[] buf, byte mask) {
int j = 0;
for (int i=0; i<values.length; i++, j++) {
if (values[i] | mask == mask)
buf[j] = (byte)values[i];
continue;
if (isHighSurrogate(values[i] && i+1<length && isLowSurrogate(values[i+1])
i++;
buf[j] = '?'; // or default replacement
}
return j;
...
}
---------- END SOURCE ----------
String#getBytes(..) and new String(bytes..) internally use slow and each time newly instatiated Charset-X-coders.
Additionally:
At first assumption user could think, that String#getBytes(byte[] buf, Charset cs) might be faster than String#getBytes(byte[] buf, String csn), because he assumes, that Charset would be internally created from csn.
As this is only true for the first call, there should be a *note* in JavaDoc about cost of those methods in comparision. Don't forget (byte[] ...) constructor's JavaDoc too.
JUSTIFICATION :
Assumed that ASCII and ISO-8859-1 have high percentage in usage on those methods especially for CORBA applications, we should have a fast shortcut in class String.
See also:
http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636319
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636323
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Fastpath for ASCII + ISO-8859-1 for methods and constructors like:
String#getBytes(..) and new String(bytes..)
Alternatives:
String#getASCIIBytes(..)
String#getISO8859_1Bytes(..)
ACTUAL -
byte[] getBytes(Charset charset)
internally instantiates CharsetEncoder which is much slower, especially on short strings.
---------- BEGIN SOURCE ----------
1 simple example:
public class String {
...
int getBytes(byte[] buf, byte mask) {
int j = 0;
for (int i=0; i<values.length; i++, j++) {
if (values[i] | mask == mask)
buf[j] = (byte)values[i];
continue;
if (isHighSurrogate(values[i] && i+1<length && isLowSurrogate(values[i+1])
i++;
buf[j] = '?'; // or default replacement
}
return j;
...
}
---------- END SOURCE ----------
- relates to
-
JDK-8054307 JEP 254: Compact Strings
- Closed