Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8230675

Pattern for a control-char matches non-control characters

    XMLWordPrintable

Details

    • CSR
    • Resolution: Unresolved
    • P3
    • tbd
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      This feature of the regular expressions is relatively rarely used.
      If an application does uses a malformed \cX sequence, it is most likely indicates a programming error.
      A compatibility system property will be provided.
      Show
      This feature of the regular expressions is relatively rarely used. If an application does uses a malformed \cX sequence, it is most likely indicates a programming error. A compatibility system property will be provided.
    • System or security property, Other
    • JDK

    Description

      Summary

      In a pattern string of regular expression a control character can be encoded as a sequence \cX, where X is a control character identifier. The list of permitted identifiers has to be restricted, so that a sequence \cX could only be used to encode a control character, and not any arbitrary character.

      Problem

      The current implementation of the method java.util.regex.Pattern.compile, when parsing \cX sequence, blindly accepts any character X as an identifier and inverts its 6th bit. This allows compiling awkward expressions (for example \c\t), which were most likely a syntactic error.

      Solution

      The list of permitted control character identifiers will be limited to 'A' through 'Z', '[', '\', ']', '^', '_', and '?', which will produce only control characters. A compatibility system property will be introduced, which will allow the users to achieve previous, less restrictive behavior.

      Specification

      The javadoc of the class java.util.regex.Pattern will be updated to implicitly specify the list of permitted control character identifiers.

        * <tr><th style="vertical-align:top; font-weight:normal" id="ctrl_x">{@code \c}<i>x</i></th>
      - *     <td headers="matches characters ctrl_x">The control character corresponding to <i>x</i></td></tr>
      + *     <td headers="matches characters ctrl_x">The control character corresponding to <i>x</i>
      + *         (<i>x</i> is either {@code A} through {@code Z} or one of
      + *          {@code ?}, {@code @}, {@code [}, {@code \\}, {@code ]}, {@code ^}, {@code _})</td></tr>
        *

      The boolean system property

      jdk.util.regex.restrictedControlCharIds

      will be provided. By default, this property will be set to "true". If explicitly set to "false", it will return previous less restrictive behavior.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              igerasim Ivan Gerasimov
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: