Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4131655

java.io.InputStreamReader performance: Factor of five speed penalty

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.1.5, 1.2.0
    • core-libs
    • generic
    • generic

      See the attachment for the code that generated these measurements; you may
      need to comment out the test for "UTF16Reader" to compile and run it. The
      test data is generated by the "gen.java" file attached to bugid 4131647,
      which turned up its own set of bugs ... Note that the custom readers took
      about 1/2 hour to write and debug. Admittedly, things like UTF-8 will be
      slower than UTF-16 but that does not justify a FACTOR OF FIVE (or more)
      difference in speed.

      ---------------------------
      From xxxx Sat Apr 18 15:56:36 1998
      To: xxxx
      Subject: Reader performance
      Cc: xxxx

      You'd asked for numbers when I asked you about performance problems in
      the Reader/Writer framework, and here are some ugly ones.

      Each of these (single) runs read 1M chars of XML data (basically, this
      was randomly generated UNICODE, with some XML framing) from files cached
      in memory. The "read" loop was "read a 1K block, then read 512 characters
      one at a time" until the end of the data was reached.

          InputStreamReader, "UnicodeLittle" 16.34 ms (JDK 1.1.5)
          InputStreamReader, "UnicodeLittle" 17.94 ms (JDK 1.2 beta4)

          Custom "UnicodeLittleReader" 3.86 ms (JDK 1.1.5)
          Custom "UnicodeLittleReader" 3.77 ms (JDK 1.2 beta4)

          InputStreamReader, "UTF8" 24.82 ms (JDK 1.1.5)
          InputStreamReader, "UTF8" 25.63 ms (JDK 1.2 beta4)

      The custom reader does the obvious stuff -- notably not allocating a
      garbage character array on each character-at-a-time read, and adding
      no superfluous method calling overhead for block reads. Stuff that
      the character converter object framework seemingly precludes.

      If the character-at-a-time reads were removed, the times were rougly five
      seconds to read the Unicode via InputStreamReader, eleven for UTF-8, and
      about 10% faster for the custom reader. That is, the custom reader is
      still on the order of 25% faster.

      For comparision, one XML parser, which doesn't use Readers because
      of their performance, read ** AND PARSED ** the two files in only
      two seconds more than the JDK's bulk read cases took ...
       
      It's no wonder the people designing these APIs are steering away from
      using the java.io.Reader classes. Which is worrisome, since all XML
      data is UNICODE.
       
      - xxxx

      <UPDATE>
      <AUTHOR> david.brownell@Eng 1998-06-29 </AUTHOR>

      Software REWRITTEN to use the bulk reads can get acceptable
      performance even with this speed penalty. In fact, I've
      now done so and outperform the fastest of the third party
      XML processing engines.

      However, for other applications I still think this is a
      pretty severe problem. Not everyone has complete control
      over all of their input data sources.

      </UPDATE>

            mr Mark Reinhold
            dbrownelsunw David Brownell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: