Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8204286

StringTokenizer not tokenizing based on the given delimiter

XMLWordPrintable

    • generic
    • generic

      ADDITIONAL SYSTEM INFORMATION :
      This issue was noticed in Java8. But it was reproducible in Java10 as well.

      A DESCRIPTION OF THE PROBLEM :
      For a certain input string StringTokenizer is not tokenizing according to the delimiter given. For instance, when the delimiter="DELIM" and input="Text1DELIMText2|Text3", it tokenizes correctly as token1="Text1" & token2="Text2|Text3". But when input="142104DELIM500-00004|DUMMY", it tokenizes as token1="142104" & token2="500-00004|". I expect token2="500-00004|DUMMY".

      In the attached source code, i have demonstrated the behavior of org.apache.commons.lang3.StringUtils & com.google.common.base.Splitter. Please note that, StringUtils also behave in the wrong way, whereas Splitter behaves in the right way.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the attached source code. All asserts should pass.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      When delimiter="DELIM" and input="142104DELIM500-00004|DUMMY",
      expected tokens; token1="142104" & token2="500-00004|DUMMY"
      ACTUAL -
      actual tokens; token1="142104" & token2="500-00004|"

      ---------- BEGIN SOURCE ----------
      import java.util.List;
      import java.util.StringTokenizer;

      import org.apache.commons.lang3.StringUtils;
      import org.junit.Assert;

      import com.google.common.base.Splitter;

      public class TestToken {

      public static void main(final String[] args) {
      final String delim = "DELIM";
      String token1 = "Text1";
      String token2 = "Text2|Text3";
      tokenize(token1, token2, delim);

      token1 = "142104";
      token2 = "500-00004|DUMMY";
      tokenize(token1, token2, delim);
      }

      private static void tokenize(final String token1, final String token2, final String delim) {
      final String input = token1 + delim + token2;
      System.out.println("input=" + input);

      // tokenize using guava Splitter
      final List<String> tokens = Splitter.on(delim).trimResults().omitEmptyStrings().splitToList(input);
      System.out.println("Splitter token1=" + tokens.get(0));
      System.out.println("Splitter token2=" + tokens.get(1));
      System.out.println();
      Assert.assertEquals(token1, tokens.get(0));
      Assert.assertEquals(token2, tokens.get(1));

      // tokenize using util.StringTokenizer
      final StringTokenizer tokenizer = new StringTokenizer(input, delim);
      final String text1 = tokenizer.nextToken();
      final String text2 = tokenizer.nextToken();
      System.out.println("StringTokenizer token1=" + text1);
      System.out.println("StringTokenizer token2=" + text2);
      System.out.println();
      Assert.assertEquals(token1, text1);
      Assert.assertEquals(token2, text2);

      // tokenize using apache.commons.lang3.StringUtils
      final String[] split = StringUtils.split(input, delim);
      System.out.println("StringUtils.split token1=" + split[0]);
      System.out.println("StringUtils.split token2=" + split[1]);
      System.out.println();
      Assert.assertEquals(token1, split[0]);
      Assert.assertEquals(token2, split[1]);
      }

      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Use com.google.common.base.Splitter

      FREQUENCY : always


            psonal Pallavi Sonal (Inactive)
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: