Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8278587

StringTokenizer(String, String, boolean) documentation bug

XMLWordPrintable

    • b03
    • generic
    • generic

        A DESCRIPTION OF THE PROBLEM :
        https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/StringTokenizer.html#%3Cinit%3E(java.lang.String,java.lang.String,boolean) said: "Each delimiter is returned as a string of length one." This is not correct if any of the delimiter is a valid Unicode surrogate pair since the returned string will be of length two because the delimiter is represented by two code units.

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        "Each delimiter is returned as a string of the code unit(s) of the delimiter."

        Or remove "Each delimiter is returned as a string of length one." and clarify that "characters" in StringTokenizer documentation context refers to Unicode code points like other documentation, e.g., that of String: "The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values)." - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/String.html.
        ACTUAL -
        "Each delimiter is returned as a string of length one."

        ---------- BEGIN SOURCE ----------
        import java.util.StringTokenizer;

        public class StringTokenizerPlayground {

          public static void main(String[] args) {
            final var s = "\uD83D\uDE00"; // Grinning Face
            final var tokenizer = new StringTokenizer(s, s, true);

            final var tokenCount = tokenizer.countTokens();

            if (tokenCount != 1) {
              throw new AssertionError();
            }

            final var token = tokenizer.nextToken();

            if (token.length() != 2) {
              throw new AssertionError();
            }

            if (!token.equals(s)) {
              throw new AssertionError();
            }
          }
        }
        ---------- END SOURCE ----------

        FREQUENCY : always


              naoto Naoto Sato
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: