Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8290488

IBM864 character encoding implementation bug

XMLWordPrintable

      ADDITIONAL SYSTEM INFORMATION :
      No limit,Works with all hardware System and OS

      A DESCRIPTION OF THE PROBLEM :
      1、The file where the bug is located
      class file: sun.nio.cs.ext.IBM864.class
      jar package file: charsets.jar

      2、The main phenomenon of the bug
      Test string: "<%adc"
      (1) Using utf-8 character set encoding, its hexadecimal value sequence is:
      3c 25 61 64 63
      (2) Using IBM864 character set encoding, its hexadecimal value sequence is:
      3c 3f 61 64 63
      When using IBM864 character set encoding, the second character '%' in the string is encoded as 3f, which is '?' in the encoding set, instead of the expected value of 25, which is '%', there is an encoding wrong problem.

      3、The root cause for the bug
      After analysis, it is found that the encoding problem of the character set for the character '%' is caused by the fact that in the IBM864.java file, the fields b2cTable and b2c in the IBM864 class define the character (%) as the character ( ٪), rather than a character (%), causing the encoding to be inconsistent with the specification and expectations. Character set definition specification of IBM864, please refer to: https://www.compart.com/en/unicode/charsets/IBM864

      REGRESSION : Last worked in version 19

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Running the test program given below: TestIBM864 can reproduce the problem stably.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      str = <%adc, encoding = UTF-8
      3c 25 61 64 63
      str = <%adc, encoding = IBM864
      3c 25 61 64 63
      ACTUAL -
      str = <%adc, encoding = UTF-8
      3c 25 61 64 63
      str = <%adc, encoding = IBM864
      3c 3f 61 64 63

      ---------- BEGIN SOURCE ----------
      public class TestIBM864 {

          public static void printBytesArray(byte[] bytesArr) throws Exception{
              for(byte b: bytesArr){
                  System.out.printf("%x ", b);
              }
              System.out.println();
          }
          public static void testEncode(String encoding) throws Exception {
              String str = "<%adc";
              System.out.printf("str = %s, encoding = %s \n", str, encoding);
              printBytesArray(str.getBytes(encoding));
          }

          public static void main(String[] args) throws Exception {
              testEncode("UTF-8");
              testEncode("IBM864");
          }
      }
      ---------- END SOURCE ----------

      FREQUENCY : always


            naoto Naoto Sato
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: