-
Bug
-
Resolution: Unresolved
-
P4
-
None
-
7u51
FULL PRODUCT VERSION :
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) Client VM (build 24.51-b03, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows 7 (6.1.7601)
A DESCRIPTION OF THE PROBLEM :
The problem is with embedded comments starting with # when COMMENTS mode is on (by flag or by inline flag).
When embedded comments are ended with Unicode line separator (\u0085, \u2028, \u2029), the comments is correctly removed, but the line separator is incorrectly included in the Pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Turn on COMMENTS mode with Pattern.COMMENTS or (?x)
2. Add # style comment and end it with a Unicode line separator (\u0085, \u2028, \u2029)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The line separator should be removed along with the comment.
ACTUAL -
The comment is correctly removed, but the line separator is included in the pattern.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/* Author: Hong Dai Thanh/nhahtdh */
class BugRegexEmbedComment {
public static void main(String args[]) {
// --- TEST CASES ---
// EXPECTED: false is printed, the regex should be parsed as /[x]/
// ACTUAL: true is printed, since \u0085 is included in the character class
System.out.println("\u0085".matches("(?x)[#\u0085x]"));
// Just to show that \u0085 is working as a line separator to end the comment
// System.out.println("x".matches("(?x)[#\u0085x]"));
// EXPECTED: true is printed, the regex should be parsed as /xy/
// ACTUAL: false is printed, the regex is actually parsed as /x\u0085y/
System.out.println("xy".matches("(?x) x # Just an x \u0085 y # And then a y"));
System.out.println();
// --- COMPARISON CASES ---
// The result is as expected when ASCII line separator \n and \r is used
System.out.println("\n".matches("(?x)[#\nx]"));
// System.out.println("x".matches("(?x)[#\nx]"));
System.out.println("xy".matches("(?x) x # Just an x \n y # And then a y"));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use \n and \r as line separator to end embedded inline comment #.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) Client VM (build 24.51-b03, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Windows 7 (6.1.7601)
A DESCRIPTION OF THE PROBLEM :
The problem is with embedded comments starting with # when COMMENTS mode is on (by flag or by inline flag).
When embedded comments are ended with Unicode line separator (\u0085, \u2028, \u2029), the comments is correctly removed, but the line separator is incorrectly included in the Pattern.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Turn on COMMENTS mode with Pattern.COMMENTS or (?x)
2. Add # style comment and end it with a Unicode line separator (\u0085, \u2028, \u2029)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The line separator should be removed along with the comment.
ACTUAL -
The comment is correctly removed, but the line separator is included in the pattern.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
/* Author: Hong Dai Thanh/nhahtdh */
class BugRegexEmbedComment {
public static void main(String args[]) {
// --- TEST CASES ---
// EXPECTED: false is printed, the regex should be parsed as /[x]/
// ACTUAL: true is printed, since \u0085 is included in the character class
System.out.println("\u0085".matches("(?x)[#\u0085x]"));
// Just to show that \u0085 is working as a line separator to end the comment
// System.out.println("x".matches("(?x)[#\u0085x]"));
// EXPECTED: true is printed, the regex should be parsed as /xy/
// ACTUAL: false is printed, the regex is actually parsed as /x\u0085y/
System.out.println("xy".matches("(?x) x # Just an x \u0085 y # And then a y"));
System.out.println();
// --- COMPARISON CASES ---
// The result is as expected when ASCII line separator \n and \r is used
System.out.println("\n".matches("(?x)[#\nx]"));
// System.out.println("x".matches("(?x)[#\nx]"));
System.out.println("xy".matches("(?x) x # Just an x \n y # And then a y"));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Use \n and \r as line separator to end embedded inline comment #.