Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6990617

Regular expression doesn't match if unicode character next to a digit.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 8
    • 6u21
    • core-libs
    • b19
    • x86
    • linux
    • Verified

      FULL PRODUCT VERSION :
      java version "1.6.0_20"
      Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
      Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      Linux jantunes-ubuntu 2.6.32-25-generic #44-Ubuntu SMP Fri Sep 17 20:26:08 UTC 2010 i686 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      Unicode characters are represented as \\+number. For instance, one could write:
                  Pattern p = Pattern.compile("\\011some text\\012");
                  Matcher m = p.matcher("\tsome text\n");
                  System.out.println(m.find()); // yields "true"

      However, if we want to match a string with a digit next to the unicode character, it doesn't match (whether we "quote" the regular expression or not). Note the "1" next to the tab character (unicode 011).
                  Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
                  Matcher m = p.matcher("\t1some text\n");
                  System.out.println(m.find()); // yields "false"

      This happens because Pattern accepts either \\0011 or \\011 for the same character. From the javadoc:
      \0nn The character with octal value 0nn (0 <= n <= 7)
      \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)





      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
                  Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
                  Matcher m = p.matcher("\t1some text\n");
                  System.out.println(m.find()); // yields "false"



      REPRODUCIBILITY :
      This bug can be reproduced always.

      CUSTOMER SUBMITTED WORKAROUND :
      Always use the largest representation for using unicode codes in regular expressions.
      Such as:
                  Pattern p = Pattern.compile("\\0011\\Q1some text\\E\\0012");
                  Matcher m = p.matcher("\t1some text\n");
                  System.out.println(m.find()); // now yields "true" as supposed to

            sflores Stephen Flores (Inactive)
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: