-
Bug
-
Resolution: Fixed
-
P3
-
1.4.0, 6
-
b119
-
generic, x86
-
generic, windows
-
Verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2200153 | 6-pool | Robert Mckenna | P3 | Closed | Won't Fix |
(1) Composite characters only "Character Classes" pattern will throw
Exception, example below shows the problem.
import java.util.regex.*;
public class RegTest {
public static void main(String args[]) {
CharSequence inputStr = "ab\u1f82cd";
String patternStr = "[\u1f80\u1f82]";
Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound) {
System.out.println("<" + Integer.toString(matcher.start())
+ ","
+ Integer.toString(matcher.end())
+ "> ");
}
}
}
(2) replace the pattern to
String patternStr = "\u1f80\u1f82";
also throw exception
(3)Pattern "[\u1f80-\u1f82]" will not have match for input string
"ab\u1f81cd" in CANONO_EQ mode, though it does catch character
\u1f80 and \u1f82. Need to iterate all characters in "Range"
and list all their "EquivalentAlternation" in CANONO_EQ mode.
import java.util.regex.*;
public class RegTest {
public static void main(String args[]) {
CharSequence inputStr = "ab\u1f81cd";
String patternStr = "[\u1f80-\u1f82]";
Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound) {
System.out.println("<" + Integer.toString(matcher.start())
+ ","
+ Integer.toString(matcher.end())
+ "> ");
} else {
System.out.println("No Match");
}
}
}
(4)Though not critical, but seems like there will be some redundency
patterns created by produceEquivalentAlternation() when dealint with
multiple combining characters in CANON_EQ mode
for example
pattern "\u1f80" will create
(?: 0x3b1 0x313 0x345 | 0x1f00 0x345 | 0x1f80 | 0x3b1 0x345 0x313 | 0x1fb3 0x313 | 0x1f80)
and "\u1f82" will create
(?: 0x3b1 0x313 0x300 0x345 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x3b1 0x313 0x345 0x300 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x3b1 0x345 0x313 0x300 | 0x1fb3 0x313 0x300 | 0x1f80 0x300 | 0x1f82)
#space has been added between hexadecimal numbers
Exception, example below shows the problem.
import java.util.regex.*;
public class RegTest {
public static void main(String args[]) {
CharSequence inputStr = "ab\u1f82cd";
String patternStr = "[\u1f80\u1f82]";
Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound) {
System.out.println("<" + Integer.toString(matcher.start())
+ ","
+ Integer.toString(matcher.end())
+ "> ");
}
}
}
(2) replace the pattern to
String patternStr = "\u1f80\u1f82";
also throw exception
(3)Pattern "[\u1f80-\u1f82]" will not have match for input string
"ab\u1f81cd" in CANONO_EQ mode, though it does catch character
\u1f80 and \u1f82. Need to iterate all characters in "Range"
and list all their "EquivalentAlternation" in CANONO_EQ mode.
import java.util.regex.*;
public class RegTest {
public static void main(String args[]) {
CharSequence inputStr = "ab\u1f81cd";
String patternStr = "[\u1f80-\u1f82]";
Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound) {
System.out.println("<" + Integer.toString(matcher.start())
+ ","
+ Integer.toString(matcher.end())
+ "> ");
} else {
System.out.println("No Match");
}
}
}
(4)Though not critical, but seems like there will be some redundency
patterns created by produceEquivalentAlternation() when dealint with
multiple combining characters in CANON_EQ mode
for example
pattern "\u1f80" will create
(?: 0x3b1 0x313 0x345 | 0x1f00 0x345 | 0x1f80 | 0x3b1 0x345 0x313 | 0x1fb3 0x313 | 0x1f80)
and "\u1f82" will create
(?: 0x3b1 0x313 0x300 0x345 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x3b1 0x313 0x345 0x300 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x3b1 0x345 0x313 0x300 | 0x1fb3 0x313 0x300 | 0x1f80 0x300 | 0x1f82)
#space has been added between hexadecimal numbers
- backported by
-
JDK-2200153 Pattern doesn't work with composite character in CANON_EQ mode
-
- Closed
-