Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8027645

Pattern.split() with positive lookahead

XMLWordPrintable

    • b117
    • generic
    • Verified

        FULL PRODUCT VERSION :
        java version " 1.7.0_11 "
        Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
        Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

        A DESCRIPTION OF THE PROBLEM :
        For the sake of simplicity I will use String.split(regex) in my examples, even though the actual bug is in Pattern.split().

        When using the regular expression " (?=\\p{Lu}) " to split a string starting with an uppercase letter, split() will split the string before it starts:
         " FooBar " .split( " (?=\\p{Lu}) " ) will result in [,Foo,Bar].

        If however this match on character 0 is the only match, the empty string is not included in the array:
         " Foo " .split( " (?=\\p{Lu}) " ) will result in [Foo].

        Stepping through the code with these examples shows that the match is correctly detected, but later the code assumes there was no match (because the end-index of the match is 0), and thus returns an array containing the input string.

        This is clearly wrong. One could argue if the empty string should be contained in the array or not, but it should either always be there, or never.


        REPRODUCIBILITY :
        This bug can be reproduced always.

              sherman Xueming Shen
              igerasim Ivan Gerasimov
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: