Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6520207

Dollar/UnixDollar bad behavior shouldn't match twice in "\n"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P4 P4
    • None
    • 7
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.7.0-ea"
      Java(TM) SE Runtime Environment (build 1.7.0-ea-b06)
      Java HotSpot(TM) Client VM (build 1.7.0-ea-b06, mixed mode, sharing)


      ADDITIONAL OS VERSION INFORMATION :
      Linux helium 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.

      http://elliotth.blogspot.com/2007/01/what-do-anchors-and-mean-in-regular.html

      the first match is the final line terminator. the second match is the end-of-input.

      in MULTILINE mode this is unfortunate (because it's not Perl-compatible and should be listed in the incompatibilities with Perl 5 in the documentation), but it's understandable because of the "or" in the definition of what MULTILINE causes $ to match.

      but in non-MULTILINE mode, this is incorrect (in that i don't see how it's specified by the documentation).

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      run the supplied test case.


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.regex.*;

      public class test {
       public static void main(String[] args) {
        Pattern p = Pattern.compile("$");
        Matcher m = p.matcher("a\nb\nc\nhello\nworld\n");
        int count = 0;
        while (m.find()) {
         ++count;
        }
        System.err.println(count);
       }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      i would have suggested using \Z, but that's broken too ;-)
      Copied from http://bugs.openjdk.java.net/show_bug.cgi?id=100084#c0
      Description From ###@###.### 2009-07-09 01:40:09 PDT

      Created an attachment (id=99) [details]
      contains the exported diff and a jtreg testcase

      sunbug=6520207

      Pattern.compile("$").matcher("a\nb\nc\n") matches twice instead of once.
      --------------------------------------------

      Adding a simple check in the Pattern$Dollar class to avoid matching without any
      content.

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Imported:
              Indexed: