Details
-
Bug
-
Resolution: Fixed
-
P4
-
6
-
b117
-
x86
-
linux
-
Not verified
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8028931 | port-stage-ppc-aix | Xueming Shen | P4 | Resolved | Fixed | master |
Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux matthew-desktop 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
I believe that Pattern.split() and String.split() are implemented incorrectly for the case where the input is an empty string, and the pattern can match zero-length subsequences. For example: Pattern.compile(".*").split("") returns an array containing an empty string. The correct behaviour would be for it to return an empty array.
Rationale: the API docs promise that, "trailing empty strings will be discarded" -- always in the one-argument version of split(), or when the limit is zero in the two argument version. This is not happening in the above case.
While the API docs do also say that, "If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form", this is not the case here, because the pattern does match the input (as shown in test case).
Looking at the source code for Pattern.split(), it would seem that the test for "no match was found" is incorrect for this particular case.
This is not the most earth-shatteringly critical bug, of course ;-)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Regex matches: 1
Number of split() results: 1
split() result 0: ""
ACTUAL -
Regex matches: 1
Number of split() results: 0
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class SplitTest {
public static void main(String[] args) {
int count = 0;
Pattern pattern = Pattern.compile(".*");
Matcher matcher = pattern.matcher("");
while (matcher.find())
count++;
System.out.println("Regex matches: " + count);
String[] strings = pattern.split("");
System.out.println("Number of split() results: " + strings.length);
for (int i = 0; i < strings.length; i++)
System.out.println("split() result " + i + ": \"" + strings[i] + "\"");
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Can't think of any, barring avoiding doing wacky things like attempting to split empty strings with weird delimiters.
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux matthew-desktop 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
I believe that Pattern.split() and String.split() are implemented incorrectly for the case where the input is an empty string, and the pattern can match zero-length subsequences. For example: Pattern.compile(".*").split("") returns an array containing an empty string. The correct behaviour would be for it to return an empty array.
Rationale: the API docs promise that, "trailing empty strings will be discarded" -- always in the one-argument version of split(), or when the limit is zero in the two argument version. This is not happening in the above case.
While the API docs do also say that, "If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form", this is not the case here, because the pattern does match the input (as shown in test case).
Looking at the source code for Pattern.split(), it would seem that the test for "no match was found" is incorrect for this particular case.
This is not the most earth-shatteringly critical bug, of course ;-)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run test case.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Regex matches: 1
Number of split() results: 1
split() result 0: ""
ACTUAL -
Regex matches: 1
Number of split() results: 0
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class SplitTest {
public static void main(String[] args) {
int count = 0;
Pattern pattern = Pattern.compile(".*");
Matcher matcher = pattern.matcher("");
while (matcher.find())
count++;
System.out.println("Regex matches: " + count);
String[] strings = pattern.split("");
System.out.println("Number of split() results: " + strings.length);
for (int i = 0; i < strings.length; i++)
System.out.println("split() result " + i + ": \"" + strings[i] + "\"");
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Can't think of any, barring avoiding doing wacky things like attempting to split empty strings with weird delimiters.
Attachments
Issue Links
- backported by
-
JDK-8028931 Pattern.compile(".*").split("") returns incorrect result
- Resolved