Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6826329

(str) Fastpath for new String(bytes..) and String#getBytes(..) for ASCII + ISO-8859-1

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 7
    • core-libs
    • x86
    • windows_xp

      A DESCRIPTION OF THE REQUEST :
      String#getBytes(..) and new String(bytes..) internally use slow and each time newly instatiated Charset-X-coders.

      Additionally:
      At first assumption user could think, that String#getBytes(byte[] buf, Charset cs) might be faster than String#getBytes(byte[] buf, String csn), because he assumes, that Charset would be internally created from csn.
      As this is only true for the first call, there should be a *note* in JavaDoc about cost of those methods in comparision. Don't forget (byte[] ...) constructor's JavaDoc too.


      JUSTIFICATION :
      Assumed that ASCII and ISO-8859-1 have high percentage in usage on those methods especially for CORBA applications, we should have a fast shortcut in class String.

        See also:
      http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636319
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636323



      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Fastpath for ASCII + ISO-8859-1 for methods and constructors like:
      String#getBytes(..) and new String(bytes..)
      Alternatives:
      String#getASCIIBytes(..)
      String#getISO8859_1Bytes(..)

      ACTUAL -
      byte[] getBytes(Charset charset)
      internally instantiates CharsetEncoder which is much slower, especially on short strings.


      ---------- BEGIN SOURCE ----------
      1 simple example:

      public class String {
          ...
          int getBytes(byte[] buf, byte mask) {
              int j = 0;
              for (int i=0; i<values.length; i++, j++) {
                  if (values[i] | mask == mask)
                      buf[j] = (byte)values[i];
                      continue;
                  if (isHighSurrogate(values[i] && i+1<length && isLowSurrogate(values[i+1])
                       i++;
                  buf[j] = '?'; // or default replacement
              }
              return j;
          ...
          }

      ---------- END SOURCE ----------

            sherman Xueming Shen
            ryeung Roger Yeung (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: