Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8136602

Seemingly valid XML fails to get parsed with org.xml.sax.SAXParseException

XMLWordPrintable

      FULL PRODUCT VERSION :
      openjdk version "1.8.0_45-internal"
      OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
      OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      Linux rei2-wt 3.19.0-28-generic #30-Ubuntu SMP Mon Aug 31 15:52:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      Parsing a simple XML fails. This program:

          public static void main(String[] args) throws Exception {
      DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("/home/vyzivus/Downloads/KD.xml");
          }

      will fail to parse the attached XML with the following error message:
      [Fatal Error] KD.xml:972:25: An invalid XML character (Unicode: 0xd840) was found in the comment.
      Exception in thread "main" org.xml.sax.SAXParseException; systemId: file:///home/vyzivus/Downloads/KD.xml; lineNumber: 972; columnNumber: 25; An invalid XML character (Unicode: 0xd840) was found in the comment.
      at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
      at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
      at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
      at com.company.Main.main(Main.java:8)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


      When the comment line in question is removed, the parser will succeed in parsing of the XML. Apparently, the comment parser will incorrectly parse the unicode character and will even report incorrect codepoint (0xd840 instead of 2000B).

      You can download the XML in question here: http://www.baka.sk/KD.xml


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Download the KD.xml file from http://www.baka.sk/KD.xml
      2. Parse the attached XML: DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("KD.xml");

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The parse succeeds and throws no exception
      ACTUAL -
      An exception is thrown: [Fatal Error] KD.xml:972:25: An invalid XML character (Unicode: 0xd840) was found in the comment.
      Exception in thread "main" org.xml.sax.SAXParseException; systemId: file:///home/vyzivus/Downloads/KD.xml; lineNumber: 972; columnNumber: 25; An invalid XML character (Unicode: 0xd840) was found in the comment.
      at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
      at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
      at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
      at com.company.Main.main(Main.java:8)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      package com.company;

      import javax.xml.parsers.DocumentBuilderFactory;

      public class Main {

          public static void main(String[] args) throws Exception {
              DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("/home/vyzivus/Downloads/KD.xml");
          }
      }

      ---------- END SOURCE ----------

        1. SimpleXMLParser.java
          0.3 kB
          Pallavi Sonal

            joehw Joe Wang
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: