-
CSR
-
Resolution: Unresolved
-
P4
-
None
-
behavioral
-
low
-
-
System or security property, Other
-
JDK
Summary
In a pattern string of regular expression a control character can be encoded as a sequence \cX, where X is a control character identifier. The list of permitted identifiers has to be restricted, so that a sequence \cX could only be used to encode a control character, and not any arbitrary character.
Problem
The current implementation of the method java.util.regex.Pattern.compile, when parsing \cX sequence, blindly accepts any character X as an identifier and inverts its 6th bit. This allows compiling awkward expressions (for example \c\t), which were most likely a syntactic error.
Solution
The list of permitted control character identifiers will be limited to 'A' through 'Z', '[', '\', ']', '^', '_', and '?', which will produce only control characters. A compatibility system property will be introduced, which will allow the users to achieve previous, less restrictive behavior.
Specification
The javadoc of the class java.util.regex.Pattern will be updated to implicitly specify the list of permitted control character identifiers.
* <tr><th style="vertical-align:top; font-weight:normal" id="ctrl_x">{@code \c}<i>x</i></th>
- * <td headers="matches characters ctrl_x">The control character corresponding to <i>x</i></td></tr>
+ * <td headers="matches characters ctrl_x">The control character corresponding to <i>x</i>
+ * (<i>x</i> is either {@code A} through {@code Z} or one of
+ * {@code ?}, {@code @}, {@code [}, {@code \\}, {@code ]}, {@code ^}, {@code _})</td></tr>
*
The boolean system property
jdk.util.regex.restrictedControlCharIds
will be provided. By default, this property will be set to "true". If explicitly set to "false", it will return previous less restrictive behavior.
- csr of
-
JDK-8230365 Pattern for a control-char matches non-control characters
-
- Open
-