Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8195686

ISO-8859-8-i charset cannot be decoded, should be mapped to ISO-8859-8

XMLWordPrintable

      FULL PRODUCT VERSION :
      java version "1.8.0_152"
      Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
      Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      non OS-specific, but reproducable on Microsoft Windows [Version 6.1.7601]

      A DESCRIPTION OF THE PROBLEM :
      The ISO-8859-8-i charset (Hebrew with logical ordering, see https://www.ietf.org/rfc/rfc1556.txt) should be mapped to ISO-8859-8, as it can currently not be decoded, even though it is the same per-character encoding as ISO-8859-8. The difference is the implied direction is reversed from visual order to logical order, although in practice ISO-8859-8 may contain logical order content too (https://en.wikipedia.org/wiki/ISO/IEC_8859-8).

      As Java strings are stored in logical order, the resulting Java strings can be correctly displayed by modern applications, even when they contain right-to-left (RTL) pieces of text in logical order (https://docs.oracle.com/javase/tutorial/2d/text/textlayoutbidirectionaltext.html). Therefore the existing ISO-8859-8 mapping can be used.

      Notes: I opened an issue at JavaMail (https://github.com/javaee/javamail/issues/302) and I was urged to open a JDK issue for this as well.

      Referenced links:
      - https://www.ietf.org/rfc/rfc1555.txt (Hebrew Character Encoding for Internet Messages)
      - https://www.ietf.org/rfc/rfc1556.txt (Handling of Bi-directional Texts in MIME)
      - https://docs.oracle.com/javase/tutorial/2d/text/textlayoutbidirectionaltext.html
      - https://en.wikipedia.org/wiki/ISO/IEC_8859-8
      - https://github.com/javaee/javamail/issues/302

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      new String(someByteArray, "ISO-8859-8-i");

      See source code for an simple test case.

      This bug initially occured when trying to decode email headers using the Java MimeUtility.decodeText() method which dynamically converts contents based on the charset information specified in the email header.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Abc משתמשים רשומים def
      OK 1
      OK 2
      ACTUAL -
      Abc משתמשים רשומים def
      OK 1
      Exception in thread "main" java.io.UnsupportedEncodingException: ISO-8859-8-i
      at java.lang.StringCoding.decode(StringCoding.java:190)
      at java.lang.String.<init>(String.java:416)
      at java.lang.String.<init>(String.java:481)

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      Exception in thread "main" java.io.UnsupportedEncodingException: ISO-8859-8-i
      at java.lang.StringCoding.decode(StringCoding.java:190)
      at java.lang.String.<init>(String.java:416)
      at java.lang.String.<init>(String.java:481)

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.UnsupportedEncodingException;
      public class test {
        public static void main(String[] args) throws UnsupportedEncodingException {
          String expected = new String(
              "\u0041" + "\u0062" + "\u0063" +
                  "\u0020" +
                  "\u05de" + "\u05e9" + "\u05ea" + "\u05de" + "\u05e9" + "\u05d9" + "\u05dd" +
                  "\u0020" +
                  "\u05e8" + "\u05e9" + "\u05d5" + "\u05de" + "\u05d9" + "\u05dd" +
                  "\u0020" +
                  "\u0064" + "\u0065" + "\u0066");

          // note that the hebrew word will be printed right-to-left in a modern terminal.
          System.out.println(expected);

          byte[] iso88598iBytesInLogicalOrder = new byte[]{
              (byte) 0x41, (byte) 0x62, (byte) 0x63,
              (byte) 0x20,
              (byte) 0xee, (byte) 0xf9, (byte) 0xfa, (byte) 0xee, (byte) 0xf9, (byte) 0xe9, (byte) 0xed,
              (byte) 0x20,
              (byte) 0xf8, (byte) 0xf9, (byte) 0xe5, (byte) 0xee, (byte) 0xe9, (byte) 0xed,
              (byte) 0x20,
              (byte) 0x64, (byte) 0x65, (byte) 0x66};

          String iso88598Decoded = new String(iso88598iBytesInLogicalOrder, "ISO-8859-8");
          if (iso88598Decoded.equals(expected)) {
            System.out.println("OK 1");
          }
          // UnsupportedEncodingException
          String iso88598iDecoded = new String(iso88598iBytesInLogicalOrder, "ISO-8859-8-i");
          if (iso88598Decoded.equals(expected)) {
            System.out.println("OK 2");
          }
        }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Use "ISO-8859-8" by replacing all instances of "ISO-8859-8-i" with "ISO-8859-8" in user input (in our case, email headers).

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: