-
Bug
-
Resolution: Duplicate
-
P4
-
None
-
8, 11, 17, 18
-
generic
-
generic
ADDITIONAL SYSTEM INFORMATION :
MacOS Mojave 10.14.6
OpenJDK 1.8.0_192-b12
OpenJDK 9+181
OpenJDK 10.0.2+13
OpenJDK 11.0.2+9
OpenJDK 12.0.2+10
OpenJDK 13+33
OpenJDK 14+36-1461
OpenJDK 15.0.2+7-27
OpenJDK 16+36-2231
OpenJDK 17+35-2724
OpenJDK 18-ea+11-557
A DESCRIPTION OF THE PROBLEM :
A new-line character placed at the end of the string causes regex incorrect regex capture group processing under certain scenarios.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the included source code
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The last capture group should always be empty.
The pattern's first capture group performs a lazy unlimited match for any character followed by end of string anchor, this should always capture the entire string.
The pattern's second capture group performs an aggressive unlimited match for any character between two end of string anchors, this should always capture nothing.
ACTUAL -
When the last character is a new-line character, the last character will be captured in the second capture group - which should be impossible.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* RegEx Bug when the last character is a new-line
*/
public class RegexTest {
public static void main(String...args){
System.out.println("Last capture group should always be empty...\n");
testRun(Pattern.compile("^([\\s\\S]*?)$([\\s\\S]*)$"));
testRun(Pattern.compile("^(.*?)$(.*)$", Pattern.DOTALL));
System.out.println("Result: if the last character is a new-line character it is erroneously captured in the second group");
System.out.println("Java version: " + System.getProperty("java.runtime.version"));
}
static void testRun(Pattern p){
System.out.println("RegEx Pattern = \"" + p + '"');
test(p, "\n"); // fail when last character is newline
test(p, "a");
test(p, "aa");
test(p, "\na");
test(p, "\n\n"); // fail when last character is newline
test(p, "\n\n\n"); // fail when last character is newline
test(p, "\n\n\n ");
System.out.println();
}
static void test(Pattern p, String input){
Matcher m = p.matcher(input);
String replacement = m
.replaceAll("[$1][$2]") // suround capture groups in brackets using regex substitution
.replace('\n', '↵'); // replace newline with visible character for easier reading
System.out.print(replacement);
m.matches();
System.out.print(0 < m.group(2).length() ? "\t◀ FAILED" : "");
System.out.println();
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None found
FREQUENCY : always
MacOS Mojave 10.14.6
OpenJDK 1.8.0_192-b12
OpenJDK 9+181
OpenJDK 10.0.2+13
OpenJDK 11.0.2+9
OpenJDK 12.0.2+10
OpenJDK 13+33
OpenJDK 14+36-1461
OpenJDK 15.0.2+7-27
OpenJDK 16+36-2231
OpenJDK 17+35-2724
OpenJDK 18-ea+11-557
A DESCRIPTION OF THE PROBLEM :
A new-line character placed at the end of the string causes regex incorrect regex capture group processing under certain scenarios.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the included source code
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The last capture group should always be empty.
The pattern's first capture group performs a lazy unlimited match for any character followed by end of string anchor, this should always capture the entire string.
The pattern's second capture group performs an aggressive unlimited match for any character between two end of string anchors, this should always capture nothing.
ACTUAL -
When the last character is a new-line character, the last character will be captured in the second capture group - which should be impossible.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* RegEx Bug when the last character is a new-line
*/
public class RegexTest {
public static void main(String...args){
System.out.println("Last capture group should always be empty...\n");
testRun(Pattern.compile("^([\\s\\S]*?)$([\\s\\S]*)$"));
testRun(Pattern.compile("^(.*?)$(.*)$", Pattern.DOTALL));
System.out.println("Result: if the last character is a new-line character it is erroneously captured in the second group");
System.out.println("Java version: " + System.getProperty("java.runtime.version"));
}
static void testRun(Pattern p){
System.out.println("RegEx Pattern = \"" + p + '"');
test(p, "\n"); // fail when last character is newline
test(p, "a");
test(p, "aa");
test(p, "\na");
test(p, "\n\n"); // fail when last character is newline
test(p, "\n\n\n"); // fail when last character is newline
test(p, "\n\n\n ");
System.out.println();
}
static void test(Pattern p, String input){
Matcher m = p.matcher(input);
String replacement = m
.replaceAll("[$1][$2]") // suround capture groups in brackets using regex substitution
.replace('\n', '↵'); // replace newline with visible character for easier reading
System.out.print(replacement);
m.matches();
System.out.print(0 < m.group(2).length() ? "\t◀ FAILED" : "");
System.out.println();
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None found
FREQUENCY : always
- duplicates
-
JDK-8218146 $ matches before end of line, even without MULTILINE mode
-
- Closed
-