Details
-
Bug
-
Resolution: Fixed
-
P4
-
7u11, 8
-
b117
-
generic
-
Verified
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8028930 | port-stage-ppc-aix | Xueming Shen | P4 | Resolved | Fixed | master |
Description
FULL PRODUCT VERSION :
java version " 1.7.0_11 "
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
A DESCRIPTION OF THE PROBLEM :
For the sake of simplicity I will use String.split(regex) in my examples, even though the actual bug is in Pattern.split().
When using the regular expression " (?=\\p{Lu}) " to split a string starting with an uppercase letter, split() will split the string before it starts:
" FooBar " .split( " (?=\\p{Lu}) " ) will result in [,Foo,Bar].
If however this match on character 0 is the only match, the empty string is not included in the array:
" Foo " .split( " (?=\\p{Lu}) " ) will result in [Foo].
Stepping through the code with these examples shows that the match is correctly detected, but later the code assumes there was no match (because the end-index of the match is 0), and thus returns an array containing the input string.
This is clearly wrong. One could argue if the empty string should be contained in the array or not, but it should either always be there, or never.
REPRODUCIBILITY :
This bug can be reproduced always.
java version " 1.7.0_11 "
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
A DESCRIPTION OF THE PROBLEM :
For the sake of simplicity I will use String.split(regex) in my examples, even though the actual bug is in Pattern.split().
When using the regular expression " (?=\\p{Lu}) " to split a string starting with an uppercase letter, split() will split the string before it starts:
" FooBar " .split( " (?=\\p{Lu}) " ) will result in [,Foo,Bar].
If however this match on character 0 is the only match, the empty string is not included in the array:
" Foo " .split( " (?=\\p{Lu}) " ) will result in [Foo].
Stepping through the code with these examples shows that the match is correctly detected, but later the code assumes there was no match (because the end-index of the match is 0), and thus returns an array containing the input string.
This is clearly wrong. One could argue if the empty string should be contained in the array or not, but it should either always be there, or never.
REPRODUCIBILITY :
This bug can be reproduced always.
Attachments
Issue Links
- backported by
-
JDK-8028930 Pattern.split() with positive lookahead
- Resolved
- relates to
-
JDK-8043324 java.lang.String.split ignores the 1st null element in JDK 8
- Resolved