Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8300209

Specify that usage of CANON_EQ in j.u.r.Pattern may lead to memory exhaustion

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 21
    • core-libs
    • None
    • behavioral
    • minimal
    • The risk of memory exhaustion has always been present, although highly unlikely.
    • Java API
    • SE

      Summary

      When a java.util.regex.Pattern is compiled with CANON_EQ among the flags, there is a moderate risk of memory exhaustion.

      Problem

      When a Pattern is compiled with CANON_EQ among the flags, there's a moderate risk of memory exhaustion, depending on the complexity of the pattern. While the specification of this flag warns about a possible performance penalty, it does not address memory exhaustion which may occur during compilation, rather than matching.

      Solution

      Complete the specification by mentioning memory exhaustion as a low to moderate risk factor.

      In addition, the implementation checks in advance whether the pattern is too complex, and throws OutOfMemoryError directly instead of attempting to allocate an amount of memory that will surely fail.

      Specification

      @@ -916,7 +916,8 @@ public final class Pattern
            * <p> There is no embedded flag character for enabling canonical
            * equivalence.
            *
      -     * <p> Specifying this flag may impose a performance penalty.  </p>
      +     * <p> Specifying this flag may impose a performance penalty
      +     * and a moderate risk of memory exhaustion.</p>
            */
           public static final int CANON_EQ = 0x80;
      
      @@ -1095,6 +1096,9 @@ public final class Pattern
            * Compiles the given regular expression into a pattern with the given
            * flags.
            *
      +     * <p>Setting {@link #CANON_EQ} among the flags may impose a moderate risk
      +     * of memory exhaustion.</p>
      +     *
            * @param  regex
            *         The expression to be compiled
            *
      
      @@ -1112,6 +1116,10 @@ public final class Pattern
            *
            * @throws  PatternSyntaxException
            *          If the expression's syntax is invalid
      +     *
      +     * @implNote If {@link #CANON_EQ} is specified and the number of combining
      +     * marks for any character is too large, an {@link java.lang.OutOfMemoryError}
      +     * is thrown.
            */
           public static Pattern compile(String regex, int flags) {
      
      @@ -1145,6 +1153,13 @@ public final class Pattern
            *         The character sequence to be matched
            *
            * @return  A new matcher for this pattern
      +     *
      +     * @implNote When a {@link Pattern} is deserialized, compilation is deferred
      +     * until a direct or indirect invocation of this method. Thus, if a
      +     * deserialized pattern has {@link #CANON_EQ} among its flags and the number
      +     * of combining marks for any character is too large, an
      +     * {@link java.lang.OutOfMemoryError} is thrown,
      +     * as in {@link #compile(String, int)}.
            */
           public Matcher matcher(CharSequence input) {

            rgiulietti Raffaello Giulietti
            rgiulietti Raffaello Giulietti
            Stuart Marks
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: