-
Bug
-
Resolution: Fixed
-
P4
-
1.4.0
-
beta2
-
generic
-
generic
-
Verified
Name: bsC130419 Date: 06/11/2001
java version "1.4.0-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta-b65)
Java HotSpot(TM) Client VM (build 1.4.0-beta-b65, mixed mode)
The included sample code attempts to compile some known-bad
regular expressions in order to demonstrate some problems with
the resulting error messages. The first three attempts involve
the two-digit backreference construct, "\Rnn", and the fourth
deals with the Unicode family construct.
First, we try use the "\Rnn" construct with only one digit, which
is an error. As you can see from the output, the compiler does
generate an error message, but it's the wrong message, and it
indicates the wrong position. The problem is that the compiler
converts the two characters following the "\R" to a number without
first verifying that both characters are digits. Converting "1)"
in this manner yields a value of 3, which is a reasonable number
for a backreference. The only reason it generates an error at all
is because it has consumed the close-parenthesis that should have
terminated the enclosing group. Since there can be up to 99
capturing groups, almost any ASCII character that follows the first
digit will pass the reasonableness test, as the second attempt
shows. That means that, in most cases, if you leave out a digit,
your regex will silently and mysteriously fail to match. The third
attempt shows what the error message is supposed to look like.
In the fourth try block, we try to use a bogus family name and get,
in addition to the usual error message and position indicator, a
list of all the valid names. Unfortunately, there's a glitch in the
code that assembles the list, causing all the names to run together
(as if they weren't cryptic enough already!). This is simple enough
to fix, but I'm wondering if the additional message content even
belongs there. That information is already available through the
docs, and it can come as a nasty surprise if you're not expecting it
--the first time I encountered this message, it was in the form of
a Swing message dialog that was about seven feet wide. This is the
only place in the class where extra information is added to an error
message.
--------------------- MessageTest.java -------------------------
import java.util.regex.*;
public class MessageTest
{
public static void main(String[] argv)
{
Pattern pattern;
try {
pattern = Pattern.compile("(Twiddlede)dee & (\\R1)dum");
System.out.println("'" + pattern.pattern() + "' compiled OK");
} catch (PatternSyntaxException ex) {
System.out.println(ex.getMessage());
}
try {
pattern = Pattern.compile("(Twiddlede)dee & (\\R1z)dum");
System.out.println("'" + pattern.pattern() + "' compiled OK");
} catch (PatternSyntaxException ex) {
System.out.println(ex.getMessage());
}
try {
pattern = Pattern.compile("(Twiddlede)dee & (\\R1$)dum");
System.out.println("'" + pattern.pattern() + "' compiled OK");
} catch (PatternSyntaxException ex) {
System.out.println(ex.getMessage());
}
System.out.println();
System.out.println();
try {
pattern = Pattern.compile("{gronk}");
System.out.println("'" + pattern.pattern() + "' compiled OK");
} catch (PatternSyntaxException ex) {
System.out.println(ex.getMessage());
}
}
}
------------------------- output --------------------------------
unclosed group around index 26
(Twiddlede)dee & (\R1)dum
^
'(Twiddlede)dee & (\R1z)dum' compiled OK
illegal backreference syntax around index 22
(Twiddlede)dee & (\R1$)dum
^
unknown character family {gronk} around index 7
{gronk}
^
Supported character families:
{Basic LatinLatin-1 SupplementLatin Extended-ALatin Extended-B
oundIPA ExtensionsSpacing Modifier LettersCombining Diacritical Ma
rksGreekCyrillicArmenianHebrewArabicSyriacThaanaDevanagariBengaliG
urmukhiGujaratiOriyaTamilTeluguKannadaMalayalamSinhalaThaiLaoTibet
anMyanmarGeorgianHangul JamoEthiopicCherokeeUnified Canadian Abori
ginal SyllabicsOghamRunicKhmerMongolianLatin Extended AdditionalGr
eek ExtendedGeneral PunctuationSuperscripts and SubscriptsCurrency
SymbolsCombining Marks for SymbolsLetterlike SymbolsNumber FormsA
rrowsMathematical OperatorsMiscellaneous TechnicalControl Pictures
Optical Character RecognitionEnclosed AlphanumericsBox DrawingBloc
k ElementsGeometric ShapesMiscellaneous SymbolsDingbatsBraille Pat
ternsCJK Radicals SupplementKangxi RadicalsIdeographic Description
CharactersCJK Symbols and PunctuationHiraganaKatakanaBopomofoHang
ul Compatibility JamoKanbunBopomofo ExtendedEnclosed CJK Letters a
nd MonthsCJK CompatibilityCJK Unified Ideographs Extension ACJK Un
ified IdeographsYi SyllablesYi RadicalsHangul SyllablesHigh Surrog
atesHigh Private Use SurrogatesLow SurrogatesPrivate UseCJK Compat
ibility IdeographsAlphabetic Presentation FormsArabic Presentation
Forms-ACombining Half MarksCJK Compatibility FormsSmall Form Vari
antsArabic Presentation Forms-BoundSpecialsHalfwidth and Fullwidth
FormsCnLuLlLtLmLoMnMeMcNdNlNoZsZlZpCcCfCoCsPdPsPePcPoSmScSkSoLMNZ
CPSLDL1allasciialnumalphablankcntrldigitgraphlowerprintpunctspaceu
pper}, {xdigit}
(Review ID: 126171)
======================================================================