-
Bug
-
Resolution: Unresolved
-
P4
-
8, 9, 10
-
x86_64
-
os_x
FULL PRODUCT VERSION :
ADDITIONAL OS VERSION INFORMATION :
macOS version 10.12.6
A DESCRIPTION OF THE PROBLEM :
This bug surfaces when empty string is used to split a string that has 4 byte UTF-8 encoded characters.
For example: String to split: String str = "$¢€𐍈�"
$ -> 00100100
¢ -> 11000010 10100010
€ -> 11100010 10000010 10101100
𐍈� -> 11110000 10010000 10001101 10001000
When the following is executed:
str.split("")
It should generate
[$, ¢, €, 𐍈�]
But it generates the following array
[$, ¢, €, ?, ?]
? -> 00111111
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String to split: String str = "$¢€𐍈�"
$ -> 00100100
¢ -> 11000010 10100010
€ -> 11100010 10000010 10101100
𐍈� -> 11110000 10010000 10001101 10001000
When the following is executed:
str.split("")
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[$, ¢, €, 𐍈�]
ACTUAL -
[$, ¢, €, ?, ?]
REPRODUCIBILITY :
This bug can be reproduced always.
ADDITIONAL OS VERSION INFORMATION :
macOS version 10.12.6
A DESCRIPTION OF THE PROBLEM :
This bug surfaces when empty string is used to split a string that has 4 byte UTF-8 encoded characters.
For example: String to split: String str = "$¢€𐍈�"
$ -> 00100100
¢ -> 11000010 10100010
€ -> 11100010 10000010 10101100
𐍈� -> 11110000 10010000 10001101 10001000
When the following is executed:
str.split("")
It should generate
[$, ¢, €, 𐍈�]
But it generates the following array
[$, ¢, €, ?, ?]
? -> 00111111
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String to split: String str = "$¢€𐍈�"
$ -> 00100100
¢ -> 11000010 10100010
€ -> 11100010 10000010 10101100
𐍈� -> 11110000 10010000 10001101 10001000
When the following is executed:
str.split("")
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[$, ¢, €, 𐍈�]
ACTUAL -
[$, ¢, €, ?, ?]
REPRODUCIBILITY :
This bug can be reproduced always.