-
Bug
-
Resolution: Not an Issue
-
P2
-
None
-
6
-
x86
-
windows_xp
FULL PRODUCT VERSION :
java version "1.6.0-beta2"
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b86)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b86, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP Professional SP 2
A DESCRIPTION OF THE PROBLEM :
This bug is responsible for the following behavior:
Some UTF-16 characters can't be put into a JDOM after they have been encoded using the CharsetEncoder. The returning ByteBuffer contains a null byte at the end. This zero byte seems to be responsible for the error while building the DOM.
Also there is a difference in version 1.5.0_07 compared to version 1.6.0 (b86). The character which causes this behaviour is different:
"u\0237" - version 1.5.0_07 OK, version 1.6.0 NOK
"u\304E" - version 1.5.0_07 NOK, version 1.6.0 OK
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the class CharsetEncoderTest twice, one time with java 1.5.0_07 and the second time with Java 1.6.0 b86...
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
CharsetEncoder should encode the two Unicode (UTF-16) characters into UTF-8 Characters, which then could be used as the Text of an XML DOM entry.
ACTUAL -
XML-DOM should accept the encoded String generated out of the ByteBuffer which returned from the CharsetEncoder.
The ByteBuffer contained a additional "empty" byte with the value = 0.
(This behavior occurs in both java versions mentioned, but with different characters...
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" org.jdom.IllegalDataException: The data "AA " is not legal for a JDOM attribute: 0x0 is not a legal XML character.
at org.jdom.Attribute.setValue(Attribute.java:486)
at org.jdom.Attribute.<init>(Attribute.java:229)
at org.jdom.Attribute.<init>(Attribute.java:252)
at org.jdom.Element.setAttribute(Element.java:1109)
at test.CharsetEncoderTest.testEncodeSaveXML(CharsetEncoderTest.java:39)
at test.CharsetEncoderTest.main(CharsetEncoderTest.java:20)
!!! NOTE !!!: The space in the String "AA " was not a space in the original Error Message. It was an undisplayable Character.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import org.jdom.Document;
import org.jdom.Element;
public class CharsetEncoderTest {
private static int encodee160 = 0x304E; // Works only with version 1.6.0
private static int encodee150_07 = 0x237; // Works only with version 1.5.0_07
private static String encoded;
public static void main(String[] args) {
testEncodeSaveXML(encodee150_07);
testEncodeSaveXML(encodee160);
}
public static void testEncodeSaveXML(int character) {
Charset set = Charset.forName("UTF-8");
CharsetEncoder encoder = set.newEncoder();
CharBuffer chb = CharBuffer.allocate(1);
chb.put((char) character);
chb.rewind();
encoder.reset();
try {
ByteBuffer bb;
bb = encoder.encode(chb);
byte[] ba = bb.array();
encoded = new String(ba, "ISO-8859-1");
Document doc = new Document();
Element e = new Element("XMLChar");
e.setAttribute("value", encoded);
doc.setRootElement(e);
} catch (CharacterCodingException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Removing the last (wrong) character from the encoded String before processing if encoding resulted in a null byte...
java version "1.6.0-beta2"
Java(TM) SE Runtime Environment (build 1.6.0-beta2-b86)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b86, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows XP Professional SP 2
A DESCRIPTION OF THE PROBLEM :
This bug is responsible for the following behavior:
Some UTF-16 characters can't be put into a JDOM after they have been encoded using the CharsetEncoder. The returning ByteBuffer contains a null byte at the end. This zero byte seems to be responsible for the error while building the DOM.
Also there is a difference in version 1.5.0_07 compared to version 1.6.0 (b86). The character which causes this behaviour is different:
"u\0237" - version 1.5.0_07 OK, version 1.6.0 NOK
"u\304E" - version 1.5.0_07 NOK, version 1.6.0 OK
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the class CharsetEncoderTest twice, one time with java 1.5.0_07 and the second time with Java 1.6.0 b86...
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
CharsetEncoder should encode the two Unicode (UTF-16) characters into UTF-8 Characters, which then could be used as the Text of an XML DOM entry.
ACTUAL -
XML-DOM should accept the encoded String generated out of the ByteBuffer which returned from the CharsetEncoder.
The ByteBuffer contained a additional "empty" byte with the value = 0.
(This behavior occurs in both java versions mentioned, but with different characters...
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" org.jdom.IllegalDataException: The data "AA " is not legal for a JDOM attribute: 0x0 is not a legal XML character.
at org.jdom.Attribute.setValue(Attribute.java:486)
at org.jdom.Attribute.<init>(Attribute.java:229)
at org.jdom.Attribute.<init>(Attribute.java:252)
at org.jdom.Element.setAttribute(Element.java:1109)
at test.CharsetEncoderTest.testEncodeSaveXML(CharsetEncoderTest.java:39)
at test.CharsetEncoderTest.main(CharsetEncoderTest.java:20)
!!! NOTE !!!: The space in the String "AA " was not a space in the original Error Message. It was an undisplayable Character.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import org.jdom.Document;
import org.jdom.Element;
public class CharsetEncoderTest {
private static int encodee160 = 0x304E; // Works only with version 1.6.0
private static int encodee150_07 = 0x237; // Works only with version 1.5.0_07
private static String encoded;
public static void main(String[] args) {
testEncodeSaveXML(encodee150_07);
testEncodeSaveXML(encodee160);
}
public static void testEncodeSaveXML(int character) {
Charset set = Charset.forName("UTF-8");
CharsetEncoder encoder = set.newEncoder();
CharBuffer chb = CharBuffer.allocate(1);
chb.put((char) character);
chb.rewind();
encoder.reset();
try {
ByteBuffer bb;
bb = encoder.encode(chb);
byte[] ba = bb.array();
encoded = new String(ba, "ISO-8859-1");
Document doc = new Document();
Element e = new Element("XMLChar");
e.setAttribute("value", encoded);
doc.setRootElement(e);
} catch (CharacterCodingException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Removing the last (wrong) character from the encoded String before processing if encoding resulted in a null byte...