Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4468249

java.util.regex.Pattern: broken error messages

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.0
    • 1.4.0
    • core-libs
    • beta2
    • generic
    • generic
    • Verified



      Name: bsC130419 Date: 06/11/2001


      java version "1.4.0-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
      Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)


      The included sample code attempts to compile some known-bad
      regular expressions in order to demonstrate some problems with
      the resulting error messages. The first three attempts involve
      the two-digit backreference construct, "\Rnn", and the fourth
      deals with the Unicode family construct.

      First, we try use the "\Rnn" construct with only one digit, which
      is an error. As you can see from the output, the compiler does
      generate an error message, but it's the wrong message, and it
      indicates the wrong position. The problem is that the compiler
      converts the two characters following the "\R" to a number without
      first verifying that both characters are digits. Converting "1)"
      in this manner yields a value of 3, which is a reasonable number
      for a backreference. The only reason it generates an error at all
      is because it has consumed the close-parenthesis that should have
      terminated the enclosing group. Since there can be up to 99
      capturing groups, almost any ASCII character that follows the first
      digit will pass the reasonableness test, as the second attempt
      shows. That means that, in most cases, if you leave out a digit,
      your regex will silently and mysteriously fail to match. The third
      attempt shows what the error message is supposed to look like.

      In the fourth try block, we try to use a bogus family name and get,
      in addition to the usual error message and position indicator, a
      list of all the valid names. Unfortunately, there's a glitch in the
      code that assembles the list, causing all the names to run together
      (as if they weren't cryptic enough already!). This is simple enough
      to fix, but I'm wondering if the additional message content even
      belongs there. That information is already available through the
      docs, and it can come as a nasty surprise if you're not expecting it
      --the first time I encountered this message, it was in the form of
      a Swing message dialog that was about seven feet wide. This is the
      only place in the class where extra information is added to an error
      message.


      --------------------- MessageTest.java -------------------------

      import java.util.regex.*;

      public class MessageTest
      {
        public static void main(String[] argv)
        {
          Pattern pattern;
          try {
            pattern = Pattern.compile("(Twiddlede)dee & (\\R1)dum");
            System.out.println("'" + pattern.pattern() + "' compiled OK");
          } catch (PatternSyntaxException ex) {
            System.out.println(ex.getMessage());
          }
          try {
            pattern = Pattern.compile("(Twiddlede)dee & (\\R1z)dum");
            System.out.println("'" + pattern.pattern() + "' compiled OK");
          } catch (PatternSyntaxException ex) {
            System.out.println(ex.getMessage());
          }
          try {
            pattern = Pattern.compile("(Twiddlede)dee & (\\R1$)dum");
            System.out.println("'" + pattern.pattern() + "' compiled OK");
          } catch (PatternSyntaxException ex) {
            System.out.println(ex.getMessage());
          }
          System.out.println();
          System.out.println();
          try {
            pattern = Pattern.compile("{gronk}");
            System.out.println("'" + pattern.pattern() + "' compiled OK");
          } catch (PatternSyntaxException ex) {
            System.out.println(ex.getMessage());
          }
        }
      }

      ------------------------- output --------------------------------

      unclosed group around index 26
              (Twiddlede)dee & (\R1)dum
                                       ^

      '(Twiddlede)dee & (\R1z)dum' compiled OK


      illegal backreference syntax around index 22
              (Twiddlede)dee & (\R1$)dum
                                   ^


      unknown character family {gronk} around index 7
              {gronk}
                    ^
      Supported character families:
          {Basic LatinLatin-1 SupplementLatin Extended-ALatin Extended-B
      oundIPA ExtensionsSpacing Modifier LettersCombining Diacritical Ma
      rksGreekCyrillicArmenianHebrewArabicSyriacThaanaDevanagariBengaliG
      urmukhiGujaratiOriyaTamilTeluguKannadaMalayalamSinhalaThaiLaoTibet
      anMyanmarGeorgianHangul JamoEthiopicCherokeeUnified Canadian Abori
      ginal SyllabicsOghamRunicKhmerMongolianLatin Extended AdditionalGr
      eek ExtendedGeneral PunctuationSuperscripts and SubscriptsCurrency
       SymbolsCombining Marks for SymbolsLetterlike SymbolsNumber FormsA
      rrowsMathematical OperatorsMiscellaneous TechnicalControl Pictures
      Optical Character RecognitionEnclosed AlphanumericsBox DrawingBloc
      k ElementsGeometric ShapesMiscellaneous SymbolsDingbatsBraille Pat
      ternsCJK Radicals SupplementKangxi RadicalsIdeographic Description
       CharactersCJK Symbols and PunctuationHiraganaKatakanaBopomofoHang
      ul Compatibility JamoKanbunBopomofo ExtendedEnclosed CJK Letters a
      nd MonthsCJK CompatibilityCJK Unified Ideographs Extension ACJK Un
      ified IdeographsYi SyllablesYi RadicalsHangul SyllablesHigh Surrog
      atesHigh Private Use SurrogatesLow SurrogatesPrivate UseCJK Compat
      ibility IdeographsAlphabetic Presentation FormsArabic Presentation
       Forms-ACombining Half MarksCJK Compatibility FormsSmall Form Vari
      antsArabic Presentation Forms-BoundSpecialsHalfwidth and Fullwidth
       FormsCnLuLlLtLmLoMnMeMcNdNlNoZsZlZpCcCfCoCsPdPsPePcPoSmScSkSoLMNZ
      CPSLDL1allasciialnumalphablankcntrldigitgraphlowerprintpunctspaceu
      pper}, {xdigit}
      (Review ID: 126171)
      ======================================================================

            mmcclosksunw Michael Mccloskey (Inactive)
            bstrathesunw Bill Strathearn (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: