Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8027645

Pattern.split() with positive lookahead

    XMLWordPrintable

Details

    • b117
    • generic
    • Verified

    Backports

      Description

        FULL PRODUCT VERSION :
        java version " 1.7.0_11 "
        Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
        Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

        A DESCRIPTION OF THE PROBLEM :
        For the sake of simplicity I will use String.split(regex) in my examples, even though the actual bug is in Pattern.split().

        When using the regular expression " (?=\\p{Lu}) " to split a string starting with an uppercase letter, split() will split the string before it starts:
         " FooBar " .split( " (?=\\p{Lu}) " ) will result in [,Foo,Bar].

        If however this match on character 0 is the only match, the empty string is not included in the array:
         " Foo " .split( " (?=\\p{Lu}) " ) will result in [Foo].

        Stepping through the code with these examples shows that the match is correctly detected, but later the code assumes there was no match (because the end-index of the match is 0), and thus returns an array containing the input string.

        This is clearly wrong. One could argue if the empty string should be contained in the array or not, but it should either always be there, or never.


        REPRODUCIBILITY :
        This bug can be reproduced always.

        Attachments

          Issue Links

            Activity

              People

                sherman Xueming Shen
                igerasim Ivan Gerasimov
                Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: