Name: nt126004 Date: 08/29/2001
java version "1.4.0-beta2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta2-b77)
Java HotSpot(TM) Client VM (build 1.4.0-beta2-b77, mixed mode)
A double ampersand in a character class is now treated as an
operator, but a single ampersand should be taken literally.
Looking at the source code, I see that it's meant to work that
way, but there's a bug. In the first run of the test program
below, the regex "[&@]+" is tried. This should match the whole
target string, "@@@@&&&&", but it only matches "@@@@", as if
the '&' were being ignored. But the next run, using the regex
"[@&]+", shows what's really happening: the character following
the '&' is being processed in the its place, so that the ']'
gets treated as a literal character, and the class never ends.
The third run shows that you can still include a literal '&' in
a character class by escaping it, but that shouldn't be
necessary.
//====================== sample code ===========================
import java.util.regex.*;
public class PatternTest
{
public static void main(String[] argv)
{
Pattern p1 = Pattern.compile(argv[0]);
Matcher m1 = p1.matcher("@@@@&&&&");
System.out.println(m1.find() ? "found: " + m1.group()
: "not found");
}
}
//======================== output =============================
$ java PatternTest '[&@]+'
found: @@@@
$ java PatternTest '[@&]+'
Exception in thread "main" java.util.regex.PatternSyntaxException:
unclosed character class around index 5
[@&]+
^
at java.util.regex.Pattern.error(Pattern.java:1455)
at java.util.regex.Pattern.clazz(Pattern.java:1916)
at java.util.regex.Pattern.sequence(Pattern.java:1511)
at java.util.regex.Pattern.expr(Pattern.java:1471)
at java.util.regex.Pattern.compile(Pattern.java:1260)
at java.util.regex.Pattern.<init>(Pattern.java:977)
at java.util.regex.Pattern.compile(Pattern.java:736)
at PatternTest.main(PatternTest.java:7)
$ java PatternTest '[@\&]+'
found: @@@@&&&&
(Review ID: 130865)
======================================================================