Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6836089

Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

    XMLWordPrintable

Details

    • b07
    • generic
    • generic
    • Verified

    Backports

      Description

        The statement

           System.out.println("\ud840\udc00".codePointAt(0));

        returns

           131072, because both \ud840 and \udc00 are surrogate characters.

        If one say
         
           JTextPane htmlPane = new JTextPane();
           htmlPane.setEditorKit(new HTMLEditorKit());

           htmlPane.setText("<html><head></head><body>&#131072;</body></html>");

        the entity reference won't be parsed correctly into a surrogate pair.

           System.out.println(htmlPane.getText());

        returns

        <html>
          <head>
            
          </head>
          <body>
            &#0;
          </body>
        </html>

        rather than

        <html>
          <head>
            
          </head>
          <body>
            &#55360;&#56320;
          </body>
        </html>


        or at least

        <html>
          <head>
            
          </head>
          <body>
            &#131072;
          </body>
        </html>

        Attachments

          Issue Links

            Activity

              People

                vkarnauk Vladislav Karnaukhov
                jloefflm Johann Löfflmann (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: