FULL PRODUCT VERSION :
java version "1.5.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09)
Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
A DESCRIPTION OF THE PROBLEM :
Encoding a string to a byte array results in the follows. All chars are ascii (in this case the letter 'a') and the encoding is utf-8.
A string of 1-9 characters results in the same # of bytes.
A string of 10-19 chars results in a single additional nulls at the end of the byte array.
A string of 20-29 chars results in two null and so on.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run this simple class below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expected the # of chars and the # of bytes to be the same.
ACTUAL -
Here's some samples:
String length=10 bytes length=11
a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 0
String length=11 bytes length=12
a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 0
String length=19 bytes length=20
a a a a a a a a a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 0
String length=20 bytes length=22
a a a a a a a a a a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 0 0
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.nio.*;
import java.nio.charset.*;
public class EncodingTest {
public static String createTestString(int len,char charval) {
StringBuffer sb = new StringBuffer();
for (int i=0; i < len; i++) {
sb.append(charval);
}
return sb.toString();
}
public static void doEncodingTest(int len,char charval) {
try {
String val = createTestString(len,charval);
Charset cs = Charset.forName("utf-8");
CharsetEncoder cse = cs.newEncoder();
char[] chars = val.toCharArray();
CharBuffer cb = CharBuffer.wrap(chars);
ByteBuffer bb = cse.encode(cb);
cse.flush(bb);
byte[] bytes = bb.array();
if (val.length() != bytes.length) {
System.out.println("String length=" + val.length() + " bytes length=" + bytes.length);
for (int i = 0; i < val.length(); i++) {
System.out.print(" " + val.charAt(i));
}
System.out.println("");
for (int i = 0; i < bytes.length; i++) {
System.out.print(" " + bytes[i]);
}
System.out.println("");
}
} catch(CharacterCodingException e) {
System.out.println(e.toString());
}
System.out.println("");
}
public static void doEncodingTests(int max,char charval) {
for (int i = 1; i <= max; i++) {
doEncodingTest(i,charval);
System.out.println("");
}
} public static void main(String[] args) {
doEncodingTests(30,'a');
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I strip null chars off the end, which adds overhead. The above case is a gross
simplification meant to show the problem.
###@###.### 2005-03-23 22:52:14 GMT
java version "1.5.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09)
Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows 2000 [Version 5.00.2195]
A DESCRIPTION OF THE PROBLEM :
Encoding a string to a byte array results in the follows. All chars are ascii (in this case the letter 'a') and the encoding is utf-8.
A string of 1-9 characters results in the same # of bytes.
A string of 10-19 chars results in a single additional nulls at the end of the byte array.
A string of 20-29 chars results in two null and so on.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run this simple class below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expected the # of chars and the # of bytes to be the same.
ACTUAL -
Here's some samples:
String length=10 bytes length=11
a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 0
String length=11 bytes length=12
a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 0
String length=19 bytes length=20
a a a a a a a a a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 0
String length=20 bytes length=22
a a a a a a a a a a a a a a a a a a a a
97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 97 0 0
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.nio.*;
import java.nio.charset.*;
public class EncodingTest {
public static String createTestString(int len,char charval) {
StringBuffer sb = new StringBuffer();
for (int i=0; i < len; i++) {
sb.append(charval);
}
return sb.toString();
}
public static void doEncodingTest(int len,char charval) {
try {
String val = createTestString(len,charval);
Charset cs = Charset.forName("utf-8");
CharsetEncoder cse = cs.newEncoder();
char[] chars = val.toCharArray();
CharBuffer cb = CharBuffer.wrap(chars);
ByteBuffer bb = cse.encode(cb);
cse.flush(bb);
byte[] bytes = bb.array();
if (val.length() != bytes.length) {
System.out.println("String length=" + val.length() + " bytes length=" + bytes.length);
for (int i = 0; i < val.length(); i++) {
System.out.print(" " + val.charAt(i));
}
System.out.println("");
for (int i = 0; i < bytes.length; i++) {
System.out.print(" " + bytes[i]);
}
System.out.println("");
}
} catch(CharacterCodingException e) {
System.out.println(e.toString());
}
System.out.println("");
}
public static void doEncodingTests(int max,char charval) {
for (int i = 1; i <= max; i++) {
doEncodingTest(i,charval);
System.out.println("");
}
} public static void main(String[] args) {
doEncodingTests(30,'a');
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I strip null chars off the end, which adds overhead. The above case is a gross
simplification meant to show the problem.
###@###.### 2005-03-23 22:52:14 GMT
- duplicates
-
JDK-4894463 (cs) CharsetDecoder.decode() returns an charArray with empty chars on the end
-
- Closed
-