-
Bug
-
Resolution: Cannot Reproduce
-
P3
-
None
-
7
-
x86
-
linux
FULL PRODUCT VERSION :
$ java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b134)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux auto-centos5-64bit.funnelback.com 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Using tika version 0.9 to convert a particular RTF file to text fails using the java 7 early access version on 64bit CentOS 5 (and I believe 64 bit Windows 2008) where it succeeds on 32 bit platforms and with java 6.
The specific stack trace encountered, which seems to lead to javax.swing.text code is...
$ java -jar tika-app-0.9.jar -t full.rtf
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
The RTF document which triggers this problem is available from http://public.funnelback.com/full.rtf
Also reported to the Tika team - See https://issues.apache.org/jira/browse/TIKA-621
REGRESSION. Last worked in version 6u24
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
- Download http://public.funnelback.com/full.rtf
- Download and build version 0.9 to tika from http://tika.apache.org/download.html
- Run java -jar tika-app-0.9.jar -t full.rtf (with the build tika-app-0.9.jar and downloaded full.rtf)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected output beginning with...
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="512186"/>
<meta name="Content-Type" content="application/rtf"/>
<meta name="resourceName" content="full.rtf"/>
<title/>
</head>
<body>
<p>Reference Handbook
Table of Contents
The Tempest: Entire Play
The Tempest Shakespeare homepage 1 | The Tempest 1 | Entire play
ACT I
SCENE I. On a ship at sea: a tempestuous noise
of thunder and lightning heard.
Enter a Master and a Boatswain
Master
Boatswain!
Boatswain
Here, master: what cheer?
ACTUAL -
Actual result was the following stack trace.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
REPRODUCIBILITY :
This bug can be reproduced always.
$ java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b134)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b04, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
$ uname -a
Linux auto-centos5-64bit.funnelback.com 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
Using tika version 0.9 to convert a particular RTF file to text fails using the java 7 early access version on 64bit CentOS 5 (and I believe 64 bit Windows 2008) where it succeeds on 32 bit platforms and with java 6.
The specific stack trace encountered, which seems to lead to javax.swing.text code is...
$ java -jar tika-app-0.9.jar -t full.rtf
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
The RTF document which triggers this problem is available from http://public.funnelback.com/full.rtf
Also reported to the Tika team - See https://issues.apache.org/jira/browse/TIKA-621
REGRESSION. Last worked in version 6u24
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
- Download http://public.funnelback.com/full.rtf
- Download and build version 0.9 to tika from http://tika.apache.org/download.html
- Run java -jar tika-app-0.9.jar -t full.rtf (with the build tika-app-0.9.jar and downloaded full.rtf)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected output beginning with...
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="512186"/>
<meta name="Content-Type" content="application/rtf"/>
<meta name="resourceName" content="full.rtf"/>
<title/>
</head>
<body>
<p>Reference Handbook
Table of Contents
The Tempest: Entire Play
The Tempest Shakespeare homepage 1 | The Tempest 1 | Entire play
ACT I
SCENE I. On a ship at sea: a tempestuous noise
of thunder and lightning heard.
Enter a Master and a Boatswain
Master
Boatswain!
Boatswain
Here, master: what cheer?
ACTUAL -
Actual result was the following stack trace.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@1fa78298
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91)
Caused by: java.lang.NullPointerException
at javax.swing.text.GapContent.compare(Unknown Source)
at javax.swing.text.GapContent.findSortIndex(Unknown Source)
at javax.swing.text.GapContent.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument.createPosition(Unknown Source)
at javax.swing.text.AbstractDocument$LeafElement.<init>(Unknown Source)
at javax.swing.text.AbstractDocument.createLeafElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertElement(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insertUpdate(Unknown Source)
at javax.swing.text.DefaultStyledDocument$ElementBuffer.insert(Unknown Source)
at javax.swing.text.DefaultStyledDocument.insertUpdate(Unknown Source)
at javax.swing.text.AbstractDocument.handleInsertString(Unknown Source)
at javax.swing.text.AbstractDocument.insertString(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser$CustomStyledDocument.insertString(RTFParser.java:376)
at javax.swing.text.rtf.RTFReader$DocumentDestination.deliverText(Unknown Source)
at javax.swing.text.rtf.RTFReader$TextHandlingDestination.handleText(Unknown Source)
at javax.swing.text.rtf.RTFReader.handleText(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:112)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 5 more
REPRODUCIBILITY :
This bug can be reproduced always.