-
Bug
-
Resolution: Not an Issue
-
P3
-
None
-
11, 12
-
x86_64
-
linux_ubuntu
A DESCRIPTION OF THE PROBLEM :
Hi, the operator '^' (negation in a character classes) seems not to work.
I provide a source code example where his behavior is totally different in Java 8 and Java 11
REGRESSION : Last worked in version 8u191
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Encoding : UTF-8
The output is ooerdqK$Fop22{78aeâ¬â¬
ACTUAL -
Encoding : UTF-8
The output is ooerdqKFop22{78ae
---------- BEGIN SOURCE ----------
import java.text.Normalizer;
import java.util.regex.Pattern;
/**
*
* @author Andres Bel Alonso
*/
public class BugExample {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// need UTF-8 encoding, ensure it
System.out.println("Encoding : " + System.getProperty("file.encoding"));
// I want to change this input ir order to delete the non ascii characters and non combining diacritical marks
// but keep ⬠and $
String input = "ooÅer®â dqK$FÆo©pø2â2£{78aâï¬eâ¬â¬";
String str = Normalizer.normalize(input, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("[^\\p{ASCII}&&[^\\p{InCombiningDiacriticalMarks}]&&[^â¬$]]");
// I make me clean string
String out = pattern.matcher(str).replaceAll("");
// Java 8 ouput : ooerdqK$Fop22{78aeâ¬â¬
// Java 11 ouput : ooerdqKFop22{78ae
// java 11 output does not complain because it cleans the characters i wanted to keep. Java 8 output is ok
System.out.println("The output is " + out);
// Finally, using the regex [\\P{ASCII}&&[\\P{InCombiningDiacriticalMarks}]&&[^â¬$]] works good in java 11
}
}
---------- END SOURCE ----------
FREQUENCY : always
Hi, the operator '^' (negation in a character classes) seems not to work.
I provide a source code example where his behavior is totally different in Java 8 and Java 11
REGRESSION : Last worked in version 8u191
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Encoding : UTF-8
The output is ooerdqK$Fop22{78aeâ¬â¬
ACTUAL -
Encoding : UTF-8
The output is ooerdqKFop22{78ae
---------- BEGIN SOURCE ----------
import java.text.Normalizer;
import java.util.regex.Pattern;
/**
*
* @author Andres Bel Alonso
*/
public class BugExample {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// need UTF-8 encoding, ensure it
System.out.println("Encoding : " + System.getProperty("file.encoding"));
// I want to change this input ir order to delete the non ascii characters and non combining diacritical marks
// but keep ⬠and $
String input = "ooÅer®â dqK$FÆo©pø2â2£{78aâï¬eâ¬â¬";
String str = Normalizer.normalize(input, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("[^\\p{ASCII}&&[^\\p{InCombiningDiacriticalMarks}]&&[^â¬$]]");
// I make me clean string
String out = pattern.matcher(str).replaceAll("");
// Java 8 ouput : ooerdqK$Fop22{78aeâ¬â¬
// Java 11 ouput : ooerdqKFop22{78ae
// java 11 output does not complain because it cleans the characters i wanted to keep. Java 8 output is ok
System.out.println("The output is " + out);
// Finally, using the regex [\\P{ASCII}&&[\\P{InCombiningDiacriticalMarks}]&&[^â¬$]] works good in java 11
}
}
---------- END SOURCE ----------
FREQUENCY : always
- relates to
-
JDK-6609854 Regex does not match correctly for negative nested character classes
- Resolved