Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6836089

Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

XMLWordPrintable

    • b07
    • generic
    • generic
    • Verified

        The statement

           System.out.println("\ud840\udc00".codePointAt(0));

        returns

           131072, because both \ud840 and \udc00 are surrogate characters.

        If one say
         
           JTextPane htmlPane = new JTextPane();
           htmlPane.setEditorKit(new HTMLEditorKit());

           htmlPane.setText("<html><head></head><body>&#131072;</body></html>");

        the entity reference won't be parsed correctly into a surrogate pair.

           System.out.println(htmlPane.getText());

        returns

        <html>
          <head>
            
          </head>
          <body>
            &#0;
          </body>
        </html>

        rather than

        <html>
          <head>
            
          </head>
          <body>
            &#55360;&#56320;
          </body>
        </html>


        or at least

        <html>
          <head>
            
          </head>
          <body>
            &#131072;
          </body>
        </html>

              vkarnauk Vladislav Karnaukhov
              jloefflm Johann Löfflmann (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved:
                Imported:
                Indexed: