Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8062362

Surrogate characters are not correct handled in XMLScanner class

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 7u55, 8u25, 8u40
    • xml

      FULL PRODUCT VERSION :
      java version "1.7.0_55"
      Java(TM) SE Runtime Environment (build 1.7.0_55-b14)
      Java HotSpot(TM) Client VM (build 24.55-b03, mixed mode, sharing)

      A DESCRIPTION OF THE PROBLEM :
      Attribute values with surrogate characters are being corrupted.
      Having the following input file "db.xml" and trying to read it and writte it back to db_out.xml using the xerces API, in the output file the first surrogate character appears twice.

      db.xml content:
      <?xml version="1.0" encoding="utf-8" standalone="yes"?>
      <JDF>
      <Command AcknowledgeURL="𨦈�𨦇�" />
      </JDF>

      db_out.xml content
      <?xml version="1.0" encoding="utf-8" standalone="yes"?>
      <JDF>
      <Command AcknowledgeURL="&#166280;&#166280;&#166279;"/>
      </JDF>



      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1. Create a file "bd.xml" and fill it with the xml code specified in "Description" section
      2. Run the piece of code from "Source code for an executable test case:" section
      3. Verify the output file "db_out.xml" and observe that the first surrogate character appears twice.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      <?xml version="1.0" encoding="utf-8" standalone="yes"?>
      <JDF>
      <Command AcknowledgeURL="&#166280;&#166279;"/>
      </JDF>
      ACTUAL -
      <?xml version="1.0" encoding="utf-8" standalone="yes"?>
      <JDF>
      <Command AcknowledgeURL="&#166280;&#166280;&#166279;"/>
      </JDF>

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      FileOutputStream fos = null;
      FileInputStream fis = null;
      BufferedWriter writer = null;
      try {
      fis = new FileInputStream("db.xml");
      InputSource in = new InputSource(fis);
      DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
      Document domDocument = builder.parse(in);

      StringWriter stringOut = new StringWriter();
      try {
      TransformerFactory transfac = TransformerFactory.newInstance();
      Transformer trans = transfac.newTransformer();
      trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
      trans.setOutputProperty(OutputKeys.STANDALONE, "yes");
      trans.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
      trans.setOutputProperty(OutputKeys.INDENT, "yes");

      StreamResult result = new StreamResult(stringOut);
      DOMSource source = new DOMSource(domDocument);
      trans.transform(source, result);

      } catch (TransformerException e) {
      e.printStackTrace();
      }

      String str = stringOut.toString();

      fos = new FileOutputStream("db_out.xml");
      writer = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
      writer.write(str, 0, str.length());
      writer.flush();

      } catch (ParserConfigurationException | SAXException | IOException e) {
      e.printStackTrace();
      } finally {
      if (null != fis) {
      fis.close();
      }

      if (null != fos) {
      fos.close();
      }

      if (null != writer) {
      writer.close();
      }
      }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      I don't have any workarround, but, as I investigate, I think the problem might be in method XMLScanner.scanAttributeValue() in
      loop (do-while) --------------> test else if (c != -1 && XMLChar.isHighSurrogate(c)) ---------> fStringBuffer3 is not cleared.

            joehw Joe Wang
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: