-
Bug
-
Resolution: Duplicate
-
P5
-
None
-
1.1.4
-
x86
-
windows_95
Name: rm29839 Date: 11/25/97
I am trying to perform code conversion using
String.getBytes("Cp939"). In Cp939 encoding,
a chunk of DBCS characters should be sandwitched by SO (\x0e) and SI (\x0f).
However, if the string contains a single DBCS character only,
the SI is NOT attached at the end. That's the problem.
If the string contains MORE THAN two DBCS characters
SI is inserted properly.
"Cp930" shows the same problem.
T.java *******************************************
import java.util.*;
import java.io.*;
import java.text.*;
public class T {
public static void main(String args[]){
String a = new String("\u3042"); // 1 DBCS character
// String a = new String("\u3042\u3042"); 2 DBCS characters
try{
byte b[];
b=a.getBytes("Cp939");
System.out.println("The length is " + b.length);
for (int i=0;i<b.length;i++){
System.out.println("The value of b[" + i +"] is:" + Integer.toHexString((int)b[i])
);
}
}
catch (UnsupportedEncodingException e){}
}
}
*******************************************************
Current behavior:
The length is 3
The value of b[0] is:e
The value of b[1] is:44
The value of b[2] is:ffffff81
Correct behavior
The length is 4
The value of b[0] is:e
The value of b[1] is:44
The value of b[2] is:ffffff81
The value of b[3] is:f
(Review ID: 20732)
======================================================================</TEXTAREA>
</td>
</tr>
<TR>
<TD colspan="2" bgcolor="#BFBFBF"> </td>
</tr>
<a name="comments"></a>
<!-- COMMENTS -->
<TR>
<TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
<img src="/bugz/images/dot.gif" width="10">Comments
</td>
<TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
<!-- BEGIN:TBR Mohan
<A href="javascript:doDateStampSubmit(document.editbug_general, 'comments');"><font size="-1">[ Date Stamp ]</font></A>
<img src="/bugz/images/dot.gif" width="18">
END:TBR -->
<A href="javascript:doFullPageSubmit(document.editbug_general, 'comments');"><font size="-1">[ Full Page ]</font></a>
<img src="/bugz/images/dot.gif" width="22">
<FONT size="-1" color="darkblue">--- Enter SUN Proprietary data here ---</font>
</td>
</tr>
<TR>
<TD bgcolor="#BFBFBF" colspan="2" nowrap align="left">
<img src="/bugz/images/dot.gif" width="5">
<TEXTAREA rows="6" cols="95" wrap="virtual" name="comments" align="left" bgcolor="white">
Name: rm29839 Date: 11/25/97
(company - IBM, JAPAN , email - ###@###.###)
======================================================================
###@###.### 2003-01-28
I am trying to perform code conversion using
String.getBytes("Cp939"). In Cp939 encoding,
a chunk of DBCS characters should be sandwitched by SO (\x0e) and SI (\x0f).
However, if the string contains a single DBCS character only,
the SI is NOT attached at the end. That's the problem.
If the string contains MORE THAN two DBCS characters
SI is inserted properly.
"Cp930" shows the same problem.
T.java *******************************************
import java.util.*;
import java.io.*;
import java.text.*;
public class T {
public static void main(String args[]){
String a = new String("\u3042"); // 1 DBCS character
// String a = new String("\u3042\u3042"); 2 DBCS characters
try{
byte b[];
b=a.getBytes("Cp939");
System.out.println("The length is " + b.length);
for (int i=0;i<b.length;i++){
System.out.println("The value of b[" + i +"] is:" + Integer.toHexString((int)b[i])
);
}
}
catch (UnsupportedEncodingException e){}
}
}
*******************************************************
Current behavior:
The length is 3
The value of b[0] is:e
The value of b[1] is:44
The value of b[2] is:ffffff81
Correct behavior
The length is 4
The value of b[0] is:e
The value of b[1] is:44
The value of b[2] is:ffffff81
The value of b[3] is:f
(Review ID: 20732)
======================================================================</TEXTAREA>
</td>
</tr>
<TR>
<TD colspan="2" bgcolor="#BFBFBF"> </td>
</tr>
<a name="comments"></a>
<!-- COMMENTS -->
<TR>
<TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
<img src="/bugz/images/dot.gif" width="10">Comments
</td>
<TD bgcolor="#BFBFBF" align="left" valign="bottom" height="24">
<!-- BEGIN:TBR Mohan
<A href="javascript:doDateStampSubmit(document.editbug_general, 'comments');"><font size="-1">[ Date Stamp ]</font></A>
<img src="/bugz/images/dot.gif" width="18">
END:TBR -->
<A href="javascript:doFullPageSubmit(document.editbug_general, 'comments');"><font size="-1">[ Full Page ]</font></a>
<img src="/bugz/images/dot.gif" width="22">
<FONT size="-1" color="darkblue">--- Enter SUN Proprietary data here ---</font>
</td>
</tr>
<TR>
<TD bgcolor="#BFBFBF" colspan="2" nowrap align="left">
<img src="/bugz/images/dot.gif" width="5">
<TEXTAREA rows="6" cols="95" wrap="virtual" name="comments" align="left" bgcolor="white">
Name: rm29839 Date: 11/25/97
(company - IBM, JAPAN , email - ###@###.###)
======================================================================
###@###.### 2003-01-28
- duplicates
-
JDK-4199599 Cp930 and Cp939 converters have problems with ITAIJI chars
-
- Resolved
-