-
Bug
-
Resolution: Unresolved
-
P3
-
None
-
18, 24, 25
-
b13
-
generic
-
generic
ADDITIONAL SYSTEM INFORMATION :
> uname -a
Linux 0be9c4498283 6.12.5-linuxkit #1 SMP Tue Jan 21 10:23:32 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
> java -version
openjdk 24 2025-03-18
OpenJDK Runtime Environment (build 24+36-3646)
OpenJDK 64-Bit Server VM (build 24+36-3646, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Enabling **canonical equivalence** causes a pattern to not match a string with a **variation selector**.
`Pattern.compile("^[^/]*\\.[^/]*$", Pattern.CANON_EQ)` does not match a string containing a variation selector (e.g., `U+FE0F`).
While `Pattern.compile("^[^/]*\\.[^/]*$")` matches the string.
The workaround is to remove variation selectors from the string.
The described bug is the root cause of the following problem:
On macOS, when a path matcher is created using the pattern `glob:*.*`,
it gets converted to the regular expression `^[^/]*\.[^/]*$`,
and the `Pattern.CANON_EQ` flag is passed during pattern compilation.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile pattern using the expression ""^[^/]*\\.[^/]*$" and the flag `Pattern.CANON_EQ`.
2. Match a string that contains a variation selector
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The pattern matches the string
ACTUAL -
The pattern does not match the string
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// there is the variant selector (U+FE0F) after the heart emoji
var strWithVariantSelector = "❤️ file.txt";
var expr = "^[^/]*\\.[^/]*$";
var pattern = Pattern.compile(expr);
var patternWithCanonEq = Pattern.compile(expr, Pattern.CANON_EQ);
var patternMatches = pattern.matcher(strWithVariantSelector).matches();
var patternWithCannonMatches = patternWithCanonEq.matcher(strWithVariantSelector).matches();
System.out.println(patternMatches); //true
System.out.println(patternWithCannonMatches); //false
}
}
---------- END SOURCE ----------
> uname -a
Linux 0be9c4498283 6.12.5-linuxkit #1 SMP Tue Jan 21 10:23:32 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
> java -version
openjdk 24 2025-03-18
OpenJDK Runtime Environment (build 24+36-3646)
OpenJDK 64-Bit Server VM (build 24+36-3646, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Enabling **canonical equivalence** causes a pattern to not match a string with a **variation selector**.
`Pattern.compile("^[^/]*\\.[^/]*$", Pattern.CANON_EQ)` does not match a string containing a variation selector (e.g., `U+FE0F`).
While `Pattern.compile("^[^/]*\\.[^/]*$")` matches the string.
The workaround is to remove variation selectors from the string.
The described bug is the root cause of the following problem:
On macOS, when a path matcher is created using the pattern `glob:*.*`,
it gets converted to the regular expression `^[^/]*\.[^/]*$`,
and the `Pattern.CANON_EQ` flag is passed during pattern compilation.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Compile pattern using the expression ""^[^/]*\\.[^/]*$" and the flag `Pattern.CANON_EQ`.
2. Match a string that contains a variation selector
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The pattern matches the string
ACTUAL -
The pattern does not match the string
---------- BEGIN SOURCE ----------
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// there is the variant selector (U+FE0F) after the heart emoji
var strWithVariantSelector = "❤️ file.txt";
var expr = "^[^/]*\\.[^/]*$";
var pattern = Pattern.compile(expr);
var patternWithCanonEq = Pattern.compile(expr, Pattern.CANON_EQ);
var patternMatches = pattern.matcher(strWithVariantSelector).matches();
var patternWithCannonMatches = patternWithCanonEq.matcher(strWithVariantSelector).matches();
System.out.println(patternMatches); //true
System.out.println(patternWithCannonMatches); //false
}
}
---------- END SOURCE ----------
- relates to
-
JDK-8354659 PathMatcher doesn't match path with specific unicode symbol
-
- Open
-
-
JDK-8187041 JEP 400: UTF-8 by Default
-
- Closed
-