-
Bug
-
Resolution: Fixed
-
P4
-
7
-
b55
-
x86
-
linux
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2228230 | 7u171 | Ivan Gerasimov | P4 | Resolved | Fixed | b01 |
JDK-8178421 | 6u181 | Ivan Gerasimov | P4 | Resolved | Fixed | b01 |
FULL PRODUCT VERSION :
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux mazoli 2.6.25.20-0.7-default #1 SMP 2010-02-26 20:32:57 +0100 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
problem independent of system, network or environment
A DESCRIPTION OF THE PROBLEM :
Pattern.java compiles e.g. the pattern "(a)?bc|d" wrong. See explanation and example code down.
The problem is independent from the operating system and java version (tested with jdk 1.5, 1.6 and 1.7). For example, i use code fragments and line numbers from Pattern.java in src.zip from jdk1.7.0_05 (64bit version)
The bug can be tracked down to the
private Node expr(Node end)
method from Pattern.java. I will explain the bug using the example pattern "(a)?bc|d" from the demo code: The (head-) node of the first sequence "(a)?bc" returned from the
Node node = sequence(end);
Node nodeTail = root; //double return
call (lines 1964-1965) is an instance of Branch: This is, because the
private Node group0()
method converts the leading "(a)?" from a Node of type Ques into a Branch containing "(a)" as first branch and null as second (optional branch) (lines 2871-2885)
if (node instanceof Ques) {
Ques ques = (Ques) node;
if (ques.type == POSSESSIVE) {
root = node;
return node;
}
tail.next = new BranchConn();
tail = tail.next;
if (ques.type == GREEDY) {
head = new Branch(head, null, tail);
} else { // Reluctant quantifier
head = new Branch(null, head, tail);
}
root = tail;
return head;
Back in the expr-method, prev is still null, so
if (prev == null) {
prev = node;
firstTail = nodeTail;
in lines 1966-1968 stores this branch in the prev variable.
Now, at second run through the loop, the sequence "d" will be returned as some kind of node. prev is not null now, so we reach
if (prev instanceof Branch) {
((Branch)prev).add(node);
in lines 1984-1985. At this point, we should create a new branch, because this is the second sequence found. But the instanceof-check assumes, that this is the third or later sequence in the branch and adds the "d" sequence to the wrong Branch ...
Possible solution: Don't use instanceof-check, but store branch information in some other local variable, e.g. see workaround below. (I commented the changes with // BEGIN CHANGE and // END CHANGE)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the PatternTest.java from source code below on any jdk
ACTUAL -
not found
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/*
* Created on 06.08.2012
*
* author rene.mazala
*/
package test.java.util.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternTest {
public static void main(String[] args) {
final Pattern pattern = Pattern.compile("(a)?bc|d");
final Matcher matcher = pattern.matcher("d");
if (matcher.find()) {
// expected: find the "d"
System.out.println(matcher.group());
} else {
// not expected: matcher can't find the "d", because pattern is buggy
System.out.println("not found");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
private Node expr(Node end) {
Node prev = null;
Node firstTail = null;
Node branchConn = null;
// BEGIN CHANGE
Branch branch = null;
// END CHANGE
for (;;) {
Node node = sequence(end);
Node nodeTail = root; //double return
if (prev == null) {
prev = node;
firstTail = nodeTail;
} else {
// Branch
if (branchConn == null) {
branchConn = new BranchConn();
branchConn.next = end;
}
if (node == end) {
// if the node returned from sequence() is "end"
// we have an empty expr, set a null atom into
// the branch to indicate to go "next" directly.
node = null;
} else {
// the "tail.next" of each atom goes to branchConn
nodeTail.next = branchConn;
}
// BEGIN CHANGE
if (branch != null) {
branch.add(node);
// END CHANGE
} else {
if (prev == end) {
prev = null;
} else {
// replace the "end" with "branchConn" at its tail.next
// when put the "prev" into the branch as the first atom.
firstTail.next = branchConn;
}
// BEGIN CHANGE
branch = new Branch(prev, node, branchConn);
prev = branch;
// END CHANGE
}
}
if (peek() != '|') {
// BEGIN CHANGE
if (branch != null) {
return branch;
}
// END CHANGE
return prev;
}
next();
}
}
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux mazoli 2.6.25.20-0.7-default #1 SMP 2010-02-26 20:32:57 +0100 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
problem independent of system, network or environment
A DESCRIPTION OF THE PROBLEM :
Pattern.java compiles e.g. the pattern "(a)?bc|d" wrong. See explanation and example code down.
The problem is independent from the operating system and java version (tested with jdk 1.5, 1.6 and 1.7). For example, i use code fragments and line numbers from Pattern.java in src.zip from jdk1.7.0_05 (64bit version)
The bug can be tracked down to the
private Node expr(Node end)
method from Pattern.java. I will explain the bug using the example pattern "(a)?bc|d" from the demo code: The (head-) node of the first sequence "(a)?bc" returned from the
Node node = sequence(end);
Node nodeTail = root; //double return
call (lines 1964-1965) is an instance of Branch: This is, because the
private Node group0()
method converts the leading "(a)?" from a Node of type Ques into a Branch containing "(a)" as first branch and null as second (optional branch) (lines 2871-2885)
if (node instanceof Ques) {
Ques ques = (Ques) node;
if (ques.type == POSSESSIVE) {
root = node;
return node;
}
tail.next = new BranchConn();
tail = tail.next;
if (ques.type == GREEDY) {
head = new Branch(head, null, tail);
} else { // Reluctant quantifier
head = new Branch(null, head, tail);
}
root = tail;
return head;
Back in the expr-method, prev is still null, so
if (prev == null) {
prev = node;
firstTail = nodeTail;
in lines 1966-1968 stores this branch in the prev variable.
Now, at second run through the loop, the sequence "d" will be returned as some kind of node. prev is not null now, so we reach
if (prev instanceof Branch) {
((Branch)prev).add(node);
in lines 1984-1985. At this point, we should create a new branch, because this is the second sequence found. But the instanceof-check assumes, that this is the third or later sequence in the branch and adds the "d" sequence to the wrong Branch ...
Possible solution: Don't use instanceof-check, but store branch information in some other local variable, e.g. see workaround below. (I commented the changes with // BEGIN CHANGE and // END CHANGE)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
run the PatternTest.java from source code below on any jdk
ACTUAL -
not found
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/*
* Created on 06.08.2012
*
* author rene.mazala
*/
package test.java.util.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternTest {
public static void main(String[] args) {
final Pattern pattern = Pattern.compile("(a)?bc|d");
final Matcher matcher = pattern.matcher("d");
if (matcher.find()) {
// expected: find the "d"
System.out.println(matcher.group());
} else {
// not expected: matcher can't find the "d", because pattern is buggy
System.out.println("not found");
}
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
private Node expr(Node end) {
Node prev = null;
Node firstTail = null;
Node branchConn = null;
// BEGIN CHANGE
Branch branch = null;
// END CHANGE
for (;;) {
Node node = sequence(end);
Node nodeTail = root; //double return
if (prev == null) {
prev = node;
firstTail = nodeTail;
} else {
// Branch
if (branchConn == null) {
branchConn = new BranchConn();
branchConn.next = end;
}
if (node == end) {
// if the node returned from sequence() is "end"
// we have an empty expr, set a null atom into
// the branch to indicate to go "next" directly.
node = null;
} else {
// the "tail.next" of each atom goes to branchConn
nodeTail.next = branchConn;
}
// BEGIN CHANGE
if (branch != null) {
branch.add(node);
// END CHANGE
} else {
if (prev == end) {
prev = null;
} else {
// replace the "end" with "branchConn" at its tail.next
// when put the "prev" into the branch as the first atom.
firstTail.next = branchConn;
}
// BEGIN CHANGE
branch = new Branch(prev, node, branchConn);
prev = branch;
// END CHANGE
}
}
if (peek() != '|') {
// BEGIN CHANGE
if (branch != null) {
return branch;
}
// END CHANGE
return prev;
}
next();
}
}
- backported by
-
JDK-2228230 Regex Pattern compilation buggy for special sequences
-
- Resolved
-
-
JDK-8178421 Regex Pattern compilation buggy for special sequences
-
- Resolved
-
- duplicates
-
JDK-8169235 Java REGEX match error
-
- Closed
-