-
Bug
-
Resolution: Fixed
-
P4
-
7, 8, 11, 14, 15
-
None
-
b16
The following code demonstrates the bug:
import java.util.regex.*;
public class T {
public static void main(String[] args) throws Throwable {
String input = "\ud801\udc37";
Pattern p = Pattern.compile(".+");
Matcher m = p.matcher(input);
m.region(0, 1);
if (m.find()) {
System.out.println(": " + m.group(0) + " : " + m.group(0).length());
}
}
}
It is expected to print a single high half of the surrogate pair, and its length of 1.
In reality, it prints
: 𐐷 : 2
i.e. the result crosses the boundary of a set region.
import java.util.regex.*;
public class T {
public static void main(String[] args) throws Throwable {
String input = "\ud801\udc37";
Pattern p = Pattern.compile(".+");
Matcher m = p.matcher(input);
m.region(0, 1);
if (m.find()) {
System.out.println(": " + m.group(0) + " : " + m.group(0).length());
}
}
}
It is expected to print a single high half of the surrogate pair, and its length of 1.
In reality, it prints
: 𐐷 : 2
i.e. the result crosses the boundary of a set region.