Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8066930

java.lang.String(byte[],String) and java.lang.String.getBytes(String) unsymmetry

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 7u72, 8u25
    • core-libs

      FULL PRODUCT VERSION :
      C:\Program Files\Java>java -version
      java version "1.7.0_72"
      Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Version 6.1.7601]

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Spring, Tiles, Mybatis

      A DESCRIPTION OF THE PROBLEM :
      I have found small problem with strings when Mybatis returns incorrectly coded(bytes data with windows-1251, marked as UTF-8) String from windows-1251 encoded database, other streaming java techniques works fine and able to recover text, but String class implementation, has very bad habit to broke some non english capital chars (russian and ukrainian exactly), while recovering from bytes. (And, year ago, while I have worked for Samsung, I have had this problem too, but it was broken tabs characters, and I was not sure,now I sure.)
      please look at test case below.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      convert windows-1251 string to unicode and back to windows1251, pleese see test case code, below.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      console output:
      GetGroupsResultset [groupId=000, description=ОАО "БАНК КИПРА" ]
      GetGroupsResultset [groupId=000, description=ОАО "БАНК КИПРА" ]

      but in file wtf.1.out result must be:

      GetGroupsResultset [groupId=000, description=ОАО "БАНК КИПРА" ]
      GetGroupsResultset [groupId=000, description=ОАО "БАНК КИПРА" ]

      ACTUAL -
      console output:
      GetGroupsResultset [groupId=000, description=ОАО "БАНК К�?ПРА" ]
      GetGroupsResultset [groupId=000, description=пїЅпїЅпїЅ "пїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅ" ]

      but in file wtf.1.out result is half correct:

      GetGroupsResultset [groupId=000, description=ОАО "БАНК КИПРА" ]
      GetGroupsResultset [groupId=000, description=пїЅпїЅпїЅ "пїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅ" ]

      as you can see second string s1 is completely broken, so it ignores exactly defined encodings somewhere, and brokes character \u0418 constantly



      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
          @Test
          public void testMyBatisGetGroups() throws Exception {

      // String myTestFileEncodingName = "UTF-8";
              String myTestFileEncodingName = "windows-1251";
      // String s0 = "GetGroupsResultset [groupId=000, description=ОАО \"БАНК КИПРА\" ]";
      // String s0 = new String( "GetGroupsResultset [groupId=000, description=ОАО \"БАНК КИПРА\" ]".getBytes(myTestFileEncodingName));
              String s0 = "GetGroupsResultset [groupId=000, description=\u041e\u0410\u041e \"\u0411\u0410\u041d\u041a \u041a\u0418\u041f\u0420\u0410\" ]";

              String s = (
                      new String(
                              new String(
                                      s0.getBytes("windows-1251")
                              ,"UTF-8")
                                      .getBytes("UTF-8")
                      ,"windows-1251")
              );
              System.out.println(new String(s0.getBytes("UTF-8")));
              System.out.println(new String(s.getBytes("UTF-8")));

              OutputStreamWriter writer1 = new OutputStreamWriter(
                      new FileOutputStream("D:/Projects/tamwatch/old/atmw/wtf.1.out")
                      , "cp1251");
              writer1.write("\r\n");//to be sure
              writer1.write(s0+"\r\n");
              writer1.write(s+"\r\n");

              writer1.flush();
              writer1.close();

          }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      using file system to save and load data. File streaming api, have everything looks fine, and handy(except of dropping first char somwere at readln or println method). or analyze what it uses and do not use Strings, using byte arrays instead.

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: