- 
    Type:
Bug
 - 
    Resolution: Fixed
 - 
    Priority:
  P4                     
     - 
    Affects Version/s: 8, 11, 17, 18
 - 
    Component/s: core-libs
 
- 
        b03
 - 
        generic
 - 
        generic
 
| Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build | 
|---|---|---|---|---|---|---|
| JDK-8279734 | 18.0.1 | Naoto Sato | P4 | Resolved | Fixed | b02 | 
| JDK-8278959 | 18 | Naoto Sato | P4 | Resolved | Fixed | b29 | 
                    A DESCRIPTION OF THE PROBLEM :
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/StringTokenizer.html#%3Cinit%3E(java.lang.String,java.lang.String,boolean) said: "Each delimiter is returned as a string of length one." This is not correct if any of the delimiter is a valid Unicode surrogate pair since the returned string will be of length two because the delimiter is represented by two code units.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
"Each delimiter is returned as a string of the code unit(s) of the delimiter."
Or remove "Each delimiter is returned as a string of length one." and clarify that "characters" in StringTokenizer documentation context refers to Unicode code points like other documentation, e.g., that of String: "The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values)." - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/String.html.
ACTUAL -
"Each delimiter is returned as a string of length one."
---------- BEGIN SOURCE ----------
import java.util.StringTokenizer;
public class StringTokenizerPlayground {
public static void main(String[] args) {
final var s = "\uD83D\uDE00"; // Grinning Face
final var tokenizer = new StringTokenizer(s, s, true);
final var tokenCount = tokenizer.countTokens();
if (tokenCount != 1) {
throw new AssertionError();
}
final var token = tokenizer.nextToken();
if (token.length() != 2) {
throw new AssertionError();
}
if (!token.equals(s)) {
throw new AssertionError();
}
}
}
---------- END SOURCE ----------
FREQUENCY : always
            
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/StringTokenizer.html#%3Cinit%3E(java.lang.String,java.lang.String,boolean) said: "Each delimiter is returned as a string of length one." This is not correct if any of the delimiter is a valid Unicode surrogate pair since the returned string will be of length two because the delimiter is represented by two code units.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
"Each delimiter is returned as a string of the code unit(s) of the delimiter."
Or remove "Each delimiter is returned as a string of length one." and clarify that "characters" in StringTokenizer documentation context refers to Unicode code points like other documentation, e.g., that of String: "The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values)." - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/String.html.
ACTUAL -
"Each delimiter is returned as a string of length one."
---------- BEGIN SOURCE ----------
import java.util.StringTokenizer;
public class StringTokenizerPlayground {
public static void main(String[] args) {
final var s = "\uD83D\uDE00"; // Grinning Face
final var tokenizer = new StringTokenizer(s, s, true);
final var tokenCount = tokenizer.countTokens();
if (tokenCount != 1) {
throw new AssertionError();
}
final var token = tokenizer.nextToken();
if (token.length() != 2) {
throw new AssertionError();
}
if (!token.equals(s)) {
throw new AssertionError();
}
}
}
---------- END SOURCE ----------
FREQUENCY : always
- backported by
 - 
                    
JDK-8278959 StringTokenizer(String, String, boolean) documentation bug
-         
     - Resolved
 
 -         
 - 
                    
JDK-8279734 StringTokenizer(String, String, boolean) documentation bug
-         
     - Resolved
 
 -         
 
- csr for
 - 
                    
JDK-8278814 StringTokenizer(String, String, boolean) documentation bug
-         
     - Closed
 
 -         
 
- links to
 - 
                    
        
        Commit
        openjdk/jdk18/9cd70906
    
 - 
                    
        
        Commit
        openjdk/jdk/8f5fdd86
    
 - 
                    
        
        Review
        openjdk/jdk18/43
    
 - 
                    
        
        Review
        openjdk/jdk/6836
    
 
             (2 links to)