Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8007395

StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 8
    • 5.0, 6u34, 7, 8
    • core-libs
    • 5.0
    • b89
    • generic
    • generic
    • Verified

        SYNOPSIS
        --------
        StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
               
        OPERATING SYSTEMS
        -----------------
        All
               
        FULL JDK VERSIONS
        -----------------
        All (Since JDK 1.5.0)

        PROBLEM DESCRIPTION
        -------------------
        When the Match.find() is called for an input String with surrogate characters in the string, it throws a StringIndexOutofBoundsException under the following circumstances:

        1. When a regex pattern results in a call to the GroupCurly.match0() method
        2. When the surrogate pair in the String is after an index > 4+ minimum expected length of the input string for the pattern
        3. When the pattern does not match the input string
               
        REPRODUCTION INSTRUCTIONS
        -------------------------
        Simply compile and run the attached test case.

        Observed behaviour (this specific trace is from 7u9):
        java.lang.StringIndexOutOfBoundsException: String index out of range: -1
                at java.lang.String.charAt(String.java:658)
                at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
                at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
                at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4360)
                at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4354)
                at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4304)
                at java.util.regex.Pattern$SliceI.match(Pattern.java:3895)
                at java.util.regex.Pattern$Start.match(Pattern.java:3408)
                at java.util.regex.Matcher.search(Matcher.java:1199)
                at java.util.regex.Matcher.find(Matcher.java:592)
                at RegexTestCase.main(RegexTestCase.java:11)
               
        Expected Behavior:
        No Exceptions should be thrown. The pattern does not match, so Matcher.find() should return false.

        TEST CASE
        ---------
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;

        public class RegexTestCase {
            public static void main(String[] args) {
                String ptrnStr = "test(.)+(@[a-zA-Z.]+)";
                Pattern ptrn = Pattern.compile (ptrnStr, Pattern.CASE_INSENSITIVE);
                String inputStr = "test this as \ud83d\ude0d";
                Matcher matcher = ptrn.matcher(inputStr);
                try {
                    if (matcher.find()) {
                        System.out.println("Found String");
                    } else {
                        System.out.println("Not found");
                    }
                } catch (StringIndexOutOfBoundsException siob) {
                    System.out.println("Testcase Failed");
                    siob.printStackTrace();
                }
            }
        }

        WORK AROUND
        ----------
        Catch the exception and treat is as a "false" return value.

        SUGGESTED FIX
        -------------
        See attachment.

              sherman Xueming Shen
              dkorbel David Korbel (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: