-
Type:
Bug
-
Resolution: Duplicate
-
Priority:
P3
-
None
-
Affects Version/s: 1.4.2
-
Component/s: core-libs
-
x86
-
windows_xp
Name: rmT116609 Date: 06/19/2003
FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)
FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]
EXTRA RELEVANT SYSTEM CONFIGURATION :
Regional Settings: Turkish
A DESCRIPTION OF THE PROBLEM :
it seems like j2sdk1.4.2b has some serious regex matching bug with strings that contain unicode characters. In my case, the string contained some Turkish chars.
regex is simple <[^>]*> which matches string runs that are enclosed in <>
(ex. <field>)
although the matching is successful with j2sdk1.4.1_02, it just doesn't match unicode containing text with 1.4.2b
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the following code excerpt with JDK1.4.2b
String text="text with some <ascii> and non ascii<ðüþÝý> characters>";
Pattern pt=Pattern.compile("<([^>]*)>");
Matcher mc=pt.matcher(text);
while (mc.find()){
String s = mc.group();
System.out.println("s = " + s);
}
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
s = <ascii>
s = <ðüþÝý>
ACTUAL -
s = <ascii>
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class BugTest {
public static void main(String[] args) {
String text="text with some <ascii> and non ascii<ðüþÝý> characters>";
Pattern pt=Pattern.compile("<([^>]*)>");
Matcher mc=pt.matcher(text);
while (mc.find()){
String s = mc.group();
System.out.println("s = " + s);
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Switching to JDK1.4.1_02 seems to be the only workaround if possible.
Release Regression From : 1.4.1_02
The above release value was the last known release where this
bug was known to work. Since then there has been a regression.
(Review ID: 187695)
======================================================================
- duplicates
-
JDK-4872664 REGRESSION: regex character class negation error
-
- Closed
-