Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8043732

javax.xml.parsers.SAXParser returns wrong buffer sometimes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • P4
    • None
    • 8
    • xml

    Description

      FULL PRODUCT VERSION :
      java version 1.8.0_05
      Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
      Java HotSpot(TM) Client VM (build 25.5-b02, mixed mode, sharing)


      ADDITIONAL OS VERSION INFORMATION :
      Windows 7 Professional 32 bit Version 6.1.7600

      A DESCRIPTION OF THE PROBLEM :
      When you use javax.xml.parsers.SAXParser against a big, big XML file, its handler's characters(char[] ch, int start, int length) returns wrong buffer sometimes.




      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      To reproduce this issue, download a zip file from http://www.ac.auone-net.jp/~lovelyfl/BigXML.zip .
      The file has one xml file and Java source to report the error.
      The source code is exactly the same as reported below.

      Run the provided Java code as follows.

      java workbench.sax.SaxParserTest BigXML.xml

      You will get:
      >A subject of multiple lines follows:
      >
      >55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
      >End of the subject

      This is error because all the subject elements are single line in the BigXML.xml file.
      If you uncomment the following line in the source code, you can make more sense of what is happening.

      //System.out.println(text);




      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      No message should be reported. This is quiet program when everything is OK.

      ACTUAL -
      A subject of multiple lines follows:

      55416 55416 55416 55416 55416 55416 55416 55416 5541 55276 55276 55276 55276 55276 55276 55276
      End of the subject
      A subject of multiple lines follows:
      4 64584 64584 64584 6458
      64585 4
      End of the subject



      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      package workbench.sax;
      import java.io.File;

      import javax.xml.parsers.SAXParser;
      import javax.xml.parsers.SAXParserFactory;

      import org.xml.sax.Attributes;
      import org.xml.sax.SAXException;
      import org.xml.sax.helpers.DefaultHandler;

      public class SaxParserTest {

      static class SaxParserHandler extends DefaultHandler {
      StringBuilder textBuilder = null;

      @Override
      public void startElement(String uri,
      String localName,
      String qName,
      Attributes attributes) throws SAXException {
      if(qName.equals("subject")) {
      textBuilder = new StringBuilder();
      }
      }

      @Override
      public void endElement(String uri,
                      String localName,
                      String qName)
      throws SAXException {
      if(qName.equals("subject")) {
      String text = textBuilder.toString();
      if(text.indexOf('\n') >= 0) {
      // A multiple-line subject, which doesn't exist in BigXML.xml.
      System.out.println("A subject of multiple lines follows:");
      System.out.println(text);
      System.out.println("End of the subject");
      } else {
      // This is normal, a single line subject
      //System.out.println(text);
      }
      }
      textBuilder = null;
      }

      @Override
      public void characters(char[] ch, int start, int length) throws SAXException {
      // ch references wrong buffer sometimes, not often.
      if(textBuilder != null) {
      textBuilder.append(ch, start, length);
      }
      }
      }
          
      public static void main(String[] args) {
      try {
      if(args.length == 1) {
      File xmlFile = new File(args[0]);
      SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
      saxParser.parse(xmlFile, new SaxParserHandler());
      } else {
      System.out.println("Usage: java SaxParserTest BigXML.xml");
      }
      } catch(Throwable e) {
      e.printStackTrace(System.err);
      }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      No good solution.
      In my case, I decomposed the xml file by my own parser to feed the small chunks to SAXParser.
      So, this is not my headache.
      I'm reporting this for the Java community.


      Attachments

        Issue Links

          Activity

            People

              aefimov Aleksej Efimov
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: