-
Bug
-
Resolution: Fixed
-
P4
-
8, 11, 17
-
b18
-
Not verified
ADDITIONAL SYSTEM INFORMATION :
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
But this has also be reproduced on newer JDK versions e.g., 14.
A DESCRIPTION OF THE PROBLEM :
When a sentence contains text like "blah blah (i.e., blah blah), blah blah" the BreakIterator.getSentenceInstance() incorrectly detects a break after the "i.e" and before the "., blah blah)", but this is not actually a sentence boundary.
FWIW, Stack Overflow discussion here: https://stackoverflow.com/q/66933006/263801
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case program below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
bi.preceding(30) returned -1
first sentence: "Due to a problem (e.g., software bug), the server is down."
ACTUAL -
bi.preceding(30) returned 21
first sentence: "Due to a problem (e.g"
---------- BEGIN SOURCE ----------
import java.text.BreakIterator;
import java.util.Locale;
public class BreakIteratorTest {
public static void main(String[] args) throws Exception {
String text = "Due to a problem (e.g., software bug), the server is down.";
BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US);
bi.setText(text);
int r = bi.preceding(30);
System.out.println("bi.preceding(30) returned " + r);
String sentence = r == BreakIterator.DONE ? text : text.substring(0, r);
System.out.println("first sentence: \"" + sentence + "\"");
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None known
FREQUENCY : always
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
But this has also be reproduced on newer JDK versions e.g., 14.
A DESCRIPTION OF THE PROBLEM :
When a sentence contains text like "blah blah (i.e., blah blah), blah blah" the BreakIterator.getSentenceInstance() incorrectly detects a break after the "i.e" and before the "., blah blah)", but this is not actually a sentence boundary.
FWIW, Stack Overflow discussion here: https://stackoverflow.com/q/66933006/263801
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the test case program below.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
bi.preceding(30) returned -1
first sentence: "Due to a problem (e.g., software bug), the server is down."
ACTUAL -
bi.preceding(30) returned 21
first sentence: "Due to a problem (e.g"
---------- BEGIN SOURCE ----------
import java.text.BreakIterator;
import java.util.Locale;
public class BreakIteratorTest {
public static void main(String[] args) throws Exception {
String text = "Due to a problem (e.g., software bug), the server is down.";
BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US);
bi.setText(text);
int r = bi.preceding(30);
System.out.println("bi.preceding(30) returned " + r);
String sentence = r == BreakIterator.DONE ? text : text.substring(0, r);
System.out.println("first sentence: \"" + sentence + "\"");
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None known
FREQUENCY : always
- relates to
-
JDK-8232447 The javadoc parser ends the first sentence of a comment too soon
-
- Open
-