-
Bug
-
Resolution: Fixed
-
P4
-
7u51
-
b04
-
x86
-
windows_7
-
Verified
FULL PRODUCT VERSION :
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) Client VM (build 24.51-b03, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601] (Windows 7)
A DESCRIPTION OF THE PROBLEM :
Pattern$BnM.optimize will optimize a Pattern$Slice or Pattern$SliceS (match a fixed string) to use Boyer-Moore algorithm. However, due to a bug in the type check, only Pattern$Slice (BMP version) is optimized; Pattern$SliceS (supplementary version) is never optimized.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Write a pattern that matches a fixed string, which contains supplementary characters. There must be >= 4 code points.
2) Access the internal structure of the Pattern via reflection. Access the Pattern.root field.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The first Node (which is in Pattern.root) should be an instance of Pattern$BnMS class.
ACTUAL -
An instance of Pattern$StartS class is found instead. Which means the search is not optimized.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/* Author: Hong Dai Thanh/nhahtdh */
import java.lang.reflect.Field;
import java.util.regex.Pattern;
class BugRegexBnMS {
public static void dissect(Pattern pattern) throws Exception {
Field rootField = Pattern.class.getDeclaredField("root");
rootField.setAccessible(true);
Object rootNode = rootField.get(pattern);
Object currNode = rootNode;
Class nodeClass = rootField.getType();
Field nextField = nodeClass.getDeclaredField("next");
nextField.setAccessible(true);
while (currNode != null) {
System.out.println(currNode.getClass().getName());
currNode = nextField.get(currNode);
}
System.out.println();
}
public static void main(String args[]) throws Exception {
// Match a fixed string with only BMP code points.
// The first node is an instance of Pattern$BnM as expected
dissect(Pattern.compile("\\QNikropoeka*Syn34@ma{re}lgin'Skper?\\E"));
// Match a fixed string with supplementary code points
// EXPECTED: The first node should be an instance of Pattern$BnMS class
// ACTUAL: The first node is an instance of Pattern$StartS class, and the next one is an instance of Pattern$SliceS class
dissect(Pattern.compile("abf\uD83C\uDC04\uD83C\uDC05\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04abfgh\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04"));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The error comes from the check in Pattern$BnM class:
static Node optimize(Node node) {
if (!(node instanceof Slice)) {
return node;
}
Pattern$Slice and Pattern$SliceS are child class of Pattern$SliceNode. This check excludes Pattern$SliceS from being optimized.
The simplest fix is to check whether node is instanceof SliceS also.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) Client VM (build 24.51-b03, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601] (Windows 7)
A DESCRIPTION OF THE PROBLEM :
Pattern$BnM.optimize will optimize a Pattern$Slice or Pattern$SliceS (match a fixed string) to use Boyer-Moore algorithm. However, due to a bug in the type check, only Pattern$Slice (BMP version) is optimized; Pattern$SliceS (supplementary version) is never optimized.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1) Write a pattern that matches a fixed string, which contains supplementary characters. There must be >= 4 code points.
2) Access the internal structure of the Pattern via reflection. Access the Pattern.root field.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The first Node (which is in Pattern.root) should be an instance of Pattern$BnMS class.
ACTUAL -
An instance of Pattern$StartS class is found instead. Which means the search is not optimized.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/* Author: Hong Dai Thanh/nhahtdh */
import java.lang.reflect.Field;
import java.util.regex.Pattern;
class BugRegexBnMS {
public static void dissect(Pattern pattern) throws Exception {
Field rootField = Pattern.class.getDeclaredField("root");
rootField.setAccessible(true);
Object rootNode = rootField.get(pattern);
Object currNode = rootNode;
Class nodeClass = rootField.getType();
Field nextField = nodeClass.getDeclaredField("next");
nextField.setAccessible(true);
while (currNode != null) {
System.out.println(currNode.getClass().getName());
currNode = nextField.get(currNode);
}
System.out.println();
}
public static void main(String args[]) throws Exception {
// Match a fixed string with only BMP code points.
// The first node is an instance of Pattern$BnM as expected
dissect(Pattern.compile("\\QNikropoeka*Syn34@ma{re}lgin'Skper?\\E"));
// Match a fixed string with supplementary code points
// EXPECTED: The first node should be an instance of Pattern$BnMS class
// ACTUAL: The first node is an instance of Pattern$StartS class, and the next one is an instance of Pattern$SliceS class
dissect(Pattern.compile("abf\uD83C\uDC04\uD83C\uDC05\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04abfgh\uD83C\uDC06\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04\uD83C\uDC04"));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The error comes from the check in Pattern$BnM class:
static Node optimize(Node node) {
if (!(node instanceof Slice)) {
return node;
}
Pattern$Slice and Pattern$SliceS are child class of Pattern$SliceNode. This check excludes Pattern$SliceS from being optimized.
The simplest fix is to check whether node is instanceof SliceS also.