Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8149447

regexp boundary matcher does not work when inner text contains dollar special char

XMLWordPrintable

      FULL PRODUCT VERSION :
      java version "1.8.0_65"
      Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
      Java HotSpot(TM) Client VM (build 25.65-b01, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Windows 10

      A DESCRIPTION OF THE PROBLEM :
      There is a bug in java.regexp, that affects the boundary "\\b" + string + "\\b" , when the inner string contains the "$" special character

      Example: Given the search string "$Eclipse", Assuming the text contains the search string as word, as example "available under the terms of the $Eclipse Public License" ..

      The Pattern "\b\Q$Eclipse\E\b" doesn't match.

      see also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=487392

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Generate a Pattern "\b\Q$Eclipse\E\b"
      Execute a find on the given text: "available under the terms of the $Eclipse $Public License"
      Compare the find in case of non boundary with Pattern "\Q$Eclipse\E"


      public static void main(String[] args) {
      String search1 = "\\b\\Q$Eclipse\\E\\b";
      String search2 = "\\Q$Public\\E";
      String text = "available under the terms of the $Eclipse $Public License";

      int patternFlags = Pattern.CASE_INSENSITIVE;
      Pattern pattern1 = Pattern.compile(search1, patternFlags);
      Pattern pattern2 = Pattern.compile(search2, patternFlags);
      Matcher m1 = pattern1.matcher(text);
      Matcher m2 = pattern2.matcher(text);

      System.out.println(String.format("m1.find():%s%nm2.find():%s", m1.find(), m2.find()));
      }

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      the Pattern should be able to find the word "$Eclipse" inside the test

      m1.find():true
      m2.find():true
      ACTUAL -
      there is no match on the search text

      m1.find():false
      m2.find():true

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      public static void main(String[] args) {
      String search1 = "\\b\\Q$Eclipse\\E\\b";
      String search2 = "\\Q$Public\\E";
      String text = "available under the terms of the $Eclipse $Public License";

      int patternFlags = Pattern.CASE_INSENSITIVE;
      Pattern pattern1 = Pattern.compile(search1, patternFlags);
      Pattern pattern2 = Pattern.compile(search2, patternFlags);
      Matcher m1 = pattern1.matcher(text);
      Matcher m2 = pattern2.matcher(text);
      // the difference in output shows the bug
      System.out.println(String.format("m1.find():%s%nm2.find():%s", m1.find(), m2.find()));
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      I've spent no time in a workaround.
      The idea is: the dollar symbol, and possibly other special characters, are not working properly when enclosed between word boundary matchers(\b)
      The solution might be a review of the specific regexp method.

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: