Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6559590

Pattern.compile(".*").split("") returns incorrect result

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P4
    • 8
    • 6
    • core-libs
    • b117
    • x86
    • linux
    • Not verified

    Backports

      Description

        FULL PRODUCT VERSION :
        java version "1.6.0"
        Java(TM) SE Runtime Environment (build 1.6.0-b105)
        Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)


        ADDITIONAL OS VERSION INFORMATION :
        Linux matthew-desktop 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux


        A DESCRIPTION OF THE PROBLEM :
        I believe that Pattern.split() and String.split() are implemented incorrectly for the case where the input is an empty string, and the pattern can match zero-length subsequences. For example: Pattern.compile(".*").split("") returns an array containing an empty string. The correct behaviour would be for it to return an empty array.

        Rationale: the API docs promise that, "trailing empty strings will be discarded" -- always in the one-argument version of split(), or when the limit is zero in the two argument version. This is not happening in the above case.

        While the API docs do also say that, "If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form", this is not the case here, because the pattern does match the input (as shown in test case).

        Looking at the source code for Pattern.split(), it would seem that the test for "no match was found" is incorrect for this particular case.

        This is not the most earth-shatteringly critical bug, of course ;-)

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Run test case.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        Regex matches: 1
        Number of split() results: 1
        split() result 0: ""
        ACTUAL -
        Regex matches: 1
        Number of split() results: 0


        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;

        public class SplitTest {
            public static void main(String[] args) {
                int count = 0;
                Pattern pattern = Pattern.compile(".*");
                Matcher matcher = pattern.matcher("");
                while (matcher.find())
                    count++;
                System.out.println("Regex matches: " + count);
                String[] strings = pattern.split("");
                System.out.println("Number of split() results: " + strings.length);
                for (int i = 0; i < strings.length; i++)
                    System.out.println("split() result " + i + ": \"" + strings[i] + "\"");
            }
        }

        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Can't think of any, barring avoiding doing wacky things like attempting to split empty strings with weird delimiters.

        Attachments

          Issue Links

            Activity

              People

                sherman Xueming Shen
                ryeung Roger Yeung (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: