REGRESSION: Regular expression matching bug with text with non-ascii characters

XMLWordPrintable

    • Type: Bug
    • Resolution: Duplicate
    • Priority: P3
    • None
    • Affects Version/s: 1.4.2
    • Component/s: core-libs



      Name: rmT116609 Date: 06/19/2003


      FULL PRODUCT VERSION :
      java version "1.4.2-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
      Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

      FULL OS VERSION :
      Microsoft Windows XP [Version 5.1.2600]

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Regional Settings: Turkish

      A DESCRIPTION OF THE PROBLEM :
      it seems like j2sdk1.4.2b has some serious regex matching bug with strings that contain unicode characters. In my case, the string contained some Turkish chars.
      regex is simple <[^>]*> which matches string runs that are enclosed in <>
      (ex. <field>)
      although the matching is successful with j2sdk1.4.1_02, it just doesn't match unicode containing text with 1.4.2b

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the following code excerpt with JDK1.4.2b

      String text="text with some <ascii> and non ascii<ðüþÝý> characters>";
      Pattern pt=Pattern.compile("<([^>]*)>");
      Matcher mc=pt.matcher(text);
      while (mc.find()){
          String s = mc.group();
          System.out.println("s = " + s);
      }


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      s = <ascii>
      s = <ðüþÝý>
      ACTUAL -
      s = <ascii>

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.Matcher;
      import java.util.regex.Pattern;

      public class BugTest {
          public static void main(String[] args) {
              String text="text with some <ascii> and non ascii<ðüþÝý> characters>";
              Pattern pt=Pattern.compile("<([^>]*)>");
              Matcher mc=pt.matcher(text);
              while (mc.find()){
                  String s = mc.group();
                  System.out.println("s = " + s);
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Switching to JDK1.4.1_02 seems to be the only workaround if possible.

      Release Regression From : 1.4.1_02
      The above release value was the last known release where this
      bug was known to work. Since then there has been a regression.

      (Review ID: 187695)
      ======================================================================

            Assignee:
            Michael Mccloskey (Inactive)
            Reporter:
            Ranjith Mandala (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: