Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8039928

Non-BMP characters in XML attribute values are duplicated across attributes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: P3 P3
    • None
    • 7u51
    • xml

      FULL PRODUCT VERSION :
      java version "1.7.0_21"
      OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-5)
      OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

      java version "1.8.0"
      Java(TM) SE Runtime Environment (build 1.8.0-b132)
      Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      Linux 3.10-3-amd64 #1 SMP Debian 3.10.11-1 (2013-09-10) x86_64 GNU/Linux


      A DESCRIPTION OF THE PROBLEM :
      When parsing attribute values in an XML 1.0 document, an internal buffer used to handle non-BMP characters is not cleared, causing every non-BMP character to be replaced by a string containing every non-BMP character in the document (including itself) since the most recent numeric character entity, resulting in corrupted input to the application, or in some cases OutOfMemoryErrors for documents containing very large numbers of such characters within XML attributes.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the attached program on an XML file containing multiple non-BMP characters within attribute values and examine its output, eg
        java Nonbmpattr test.xml

      An XML file which demonstrates the problem follows. This is the file used for the expected and actual results below.

      <?xml version='1.0' encoding='UTF-8'?>
      <root>
        <node attr="🔇" />
        <node attr="🔈" />
        <node attr="🔉" />
        <node attr="🔊" />
      </root>


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The output of the program should be

      Element root:
      Element node:
              attr : 🔇
      Element node:
              attr : 🔈
      Element node:
              attr : 🔉
      Element node:
              attr : 🔊

      ACTUAL -
      The output of the program is:

      Element root:
      Element node:
              attr : 🔇
      Element node:
              attr : 🔇🔈
      Element node:
              attr : 🔇🔈🔉
      Element node:
              attr : 🔇🔈🔉🔊


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import javax.xml.parsers.SAXParserFactory;
      import org.xml.sax.*;
      import org.xml.sax.helpers.*;

      public class Nonbmpattr extends DefaultHandler {
        public static void main(String argv[]) throws Exception {
          if (1 != argv.length) {
            System.err.println("Usage: java Nonbmpattr <input>");
            System.exit(64);
          }

          SAXParserFactory.newInstance().newSAXParser().parse(
            argv[0], new Nonbmpattr());
        }

        @Override
        public void startElement(String uri, String localName,
                                 String qName, Attributes attrs)
        throws SAXException {
          System.out.println("Element " + qName + ":");
          for (int i = 0; i < attrs.getLength(); ++i)
            System.out.println('\t' + attrs.getLocalName(i) + "\t: " +
                               attrs.getValue(i));
        }
      }

      ---------- END SOURCE ----------

            Unassigned Unassigned
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: