Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6442955

UTF-8 encoder returns a byte array with a null byte

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P2 P2
    • None
    • 6
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.6.0-beta2"
      Java(TM) SE Runtime Environment (build 1.6.0-beta2-b86)
      Java HotSpot(TM) Client VM (build 1.6.0-beta2-b86, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Windows XP Professional SP 2

      A DESCRIPTION OF THE PROBLEM :
      This bug is responsible for the following behavior:
      Some UTF-16 characters can't be put into a JDOM after they have been encoded using the CharsetEncoder. The returning ByteBuffer contains a null byte at the end. This zero byte seems to be responsible for the error while building the DOM.

      Also there is a difference in version 1.5.0_07 compared to version 1.6.0 (b86). The character which causes this behaviour is different:

      "u\0237" - version 1.5.0_07 OK, version 1.6.0 NOK
      "u\304E" - version 1.5.0_07 NOK, version 1.6.0 OK




      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the class CharsetEncoderTest twice, one time with java 1.5.0_07 and the second time with Java 1.6.0 b86...

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      CharsetEncoder should encode the two Unicode (UTF-16) characters into UTF-8 Characters, which then could be used as the Text of an XML DOM entry.
      ACTUAL -
      XML-DOM should accept the encoded String generated out of the ByteBuffer which returned from the CharsetEncoder.

      The ByteBuffer contained a additional "empty" byte with the value = 0.

      (This behavior occurs in both java versions mentioned, but with different characters...

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      Exception in thread "main" org.jdom.IllegalDataException: The data "AA " is not legal for a JDOM attribute: 0x0 is not a legal XML character.
      at org.jdom.Attribute.setValue(Attribute.java:486)
      at org.jdom.Attribute.<init>(Attribute.java:229)
      at org.jdom.Attribute.<init>(Attribute.java:252)
      at org.jdom.Element.setAttribute(Element.java:1109)
      at test.CharsetEncoderTest.testEncodeSaveXML(CharsetEncoderTest.java:39)
      at test.CharsetEncoderTest.main(CharsetEncoderTest.java:20)


      !!! NOTE !!!: The space in the String "AA " was not a space in the original Error Message. It was an undisplayable Character.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.UnsupportedEncodingException;
      import java.nio.ByteBuffer;
      import java.nio.CharBuffer;
      import java.nio.charset.CharacterCodingException;
      import java.nio.charset.Charset;
      import java.nio.charset.CharsetEncoder;

      import org.jdom.Document;
      import org.jdom.Element;

      public class CharsetEncoderTest {

          private static int encodee160 = 0x304E; // Works only with version 1.6.0
          private static int encodee150_07 = 0x237; // Works only with version 1.5.0_07
          private static String encoded;

          public static void main(String[] args) {
              testEncodeSaveXML(encodee150_07);
              testEncodeSaveXML(encodee160);
          }
          
          public static void testEncodeSaveXML(int character) {
              Charset set = Charset.forName("UTF-8");
              CharsetEncoder encoder = set.newEncoder();
              CharBuffer chb = CharBuffer.allocate(1);
              chb.put((char) character);
              chb.rewind();
              encoder.reset();
              try {
                  ByteBuffer bb;
                  bb = encoder.encode(chb);
                  byte[] ba = bb.array();
                  encoded = new String(ba, "ISO-8859-1");
                  Document doc = new Document();
                  Element e = new Element("XMLChar");
                  e.setAttribute("value", encoded);
                  doc.setRootElement(e);
              } catch (CharacterCodingException e) {
                  e.printStackTrace();
              } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Removing the last (wrong) character from the encoded String before processing if encoding resulted in a null byte...

            martin Martin Buchholz
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: