Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6682541

Optimize Character en/decoding

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Duplicate
    • Icon: P5 P5
    • None
    • 7
    • core-libs
    • x86
    • linux

      A DESCRIPTION OF THE REQUEST :
      Decoding short bytes sequences to characters and vice versa is very costly. The overhead for a simple String.getBytes(cs) or new String(byte [], Charset) is huge. There are several reasons:

      1) Encoding and Decoding is stateful.
      2) Encoding and Decoding makes no difference if it's a very simple Charset like ISO-8859-1 or a more complex one like UTF-8.
      3) Because of 1) a lot of effort has to be made to guarantee thread safety.

      We should optimize the easy but heavily used case of single byte encodings like ISO-8859 or ASCII that map exactly one byte to one char.

      JUSTIFICATION :
      Converting Strings to bytes and vice versa is heavily used. A lot of code might benefit from a improvement.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Support optimized en/decoding for single byte Charsets by subclassing Charset. I propose something like that:

      package java.nio.charset;

      /**
      * A Charset that maps exactly one byte to exactly one char.
      */
      public abstract class SingleByteCharset extends Charset {
          protected SingleByteCharset(String canonicalName, String[] aliases) {
              super(canonicalName, aliases);
          }

          /**
           * Method that decodes a byte in this charset into a Unicode
           * character.
           *
           * <p> This method always replaces malformed-input and unmappable-character
           * sequences with this charset's default replacement byte array. In order
           * to detect such sequences, use the {@link
           * CharsetDecoder#decode(java.nio.ByteBuffer)} method directly. </p>
           *
           * @param b The byte to be decoded
           *
           * @return The decoded character
           */
          public abstract char decode(byte b);

          /**
           * Method that encodes Unicode character into byte in this
           * charset.
           *
           * <p> This method always replaces malformed-input and unmappable-character
           * sequences with this charset's default replacement string. In order to
           * detect such sequences, use the {@link
           * CharsetEncoder#encode(java.nio.CharBuffer)} method directly. </p>
           *
           * @param c The char to be encoded
           *
           * @return The byte encoded character
           */
          public byte encode(char c){
              return encode(c, (byte) '?');
          }

          /**
           * Method that encodes Unicode character into byte in this
           * charset.
           *
           * <p> This method always replaces malformed-input and unmappable-character
           * sequences with this charset's default replacement string. In order to
           * detect such sequences, use the {@link
           * CharsetEncoder#encode(java.nio.CharBuffer)} method directly. </p>
           *
           * @param c The char to be encoded
           * @param replacement The replacement byte to use if c is unmappable.
           *
           * @return The byte encoded character
           */
          public abstract byte encode(char c, byte replacement);
      }

      All single byte Charset classes should extends SingleByteCharset instead of Charset and
      implement the new methods.
      StringCoding should be changed to use the new methods if possible (if cs instanceof SingleByteCharset).
      Other uses of CharsetDecoder and CharsetEncoder should be reviewed.
      ACTUAL -
      A simple test that encodes/decodes small Strings (4...10 characters) by calling String.getBytes(ISO_8859_1) and new String(byte [], ISO_8859_1) will run about 8 times faster after applying the proposed change.
      A Patch for JDK7 could be supplied on request.

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: