Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6582588

HTMLEditorKit improperly displays entities for supplementary characters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P5 P5
    • None
    • 6
    • client-libs

      FULL PRODUCT VERSION :
      java version "1.6.0_02"
      Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
      Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode, sharing)

      java version "1.7.0-ea"
      Java(TM) SE Runtime Environment (build 1.7.0-ea-b15)
      Java HotSpot(TM) Client VM (build 1.7.0-ea-b15, mixed mode, sharing)


      ADDITIONAL OS VERSION INFORMATION :
      Linux hostname 2.6.18 #1 Wed Sep 20 03:01:24 CDT 2006 i686 athlon-4 i386 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      JEditorPane does not properly display an HTML character entity representing a value greater than 65535. Instead, it truncates the value to 16 bits. Supplementary characters can only be displayed by placing a surrogate character pair in the HTML.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Create a JEditorPane, and set its content to any HTML document containing a character entity representing a supplementary character, such as "𐐅". It is necessary to install, or place in $JAVA_HOME/jre/lib/fonts/fallback, a font which contains this character; at this time, the only such font I know of is Code2001. If the font is not in the fallback directory, the HTML or JEditorPane must explicitly specify the font family.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The JEditorPane should display the U+10405 DESERET CAPITAL LETTER LONG OO character, which looks like an oval with a vertical line through it.
      ACTUAL -
      The character entity is stripped to its lowest 16 bits, which causes U+0405 CYRILLIC CAPITAL LETTER DZE (which resembles an English 'S') to be displayed.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.awt.*;
      import javax.swing.*;

      public class SupplementaryTest
      {
          public static void main(String[] args)
          {
              EventQueue.invokeLater(new Runnable()
              {
                  public void run()
                  {
                      JEditorPane editorPane = new JEditorPane("text/html",
                          "<html><body><p style="
                          + "'font-family: Code2001;"
                          + " font-size: 24pt;"
                          + "'>"
                          + "&#66565;"
                          + "</body></html>");
                      editorPane.setEditable(false);
                      JFrame frame = new JFrame("Supp Test");
                      frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
                      frame.getContentPane().add(new JScrollPane(editorPane));
                      frame.setSize(300, 300);
                      frame.setLocationByPlatform(true);
                      frame.setVisible(true);
                  }
              });
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      The workaround is to have HTML entities for the corresponding UTF-16 surrogate pair. In the case of U+10405, placing "&#55297;&#56325;" (that is, U+D801 U+DC05) in the HTML will produce the desired character.

            peterz Peter Zhelezniakov
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: