Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8278518

String(byte[], int, int, Charset) constructor and String.translateEscapes() miss bounds check elimination

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Fixed
    • P4
    • 19
    • 17, 18, 19
    • hotspot
    • b08

    Description

      First this was spotted by Amir Hadadi in https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor

      It looks like in the following code

      while (offset < sl) {
          int b1 = bytes[offset];
          if (b1 >= 0) {
              dst[dp++] = (byte)b1;
              offset++; // <---
              continue;
          }
          if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
                  offset + 1 < sl) {
              int b2 = bytes[offset + 1];
              if (!isNotContinuation(b2)) {
                  dst[dp++] = (byte)decode2(b1, b2);
                  offset += 2;
                  continue;
              }
          }
          // anything not a latin1, including the repl
          // we have to go with the utf16
          break;
      }

      bounds check elimination is not executed when accessing byte array via bytes[offset].

      The reason, I guess, is that offset variable is modified within the loop (marked with arrow).

      Possible fix for this could be changing:

      while (offset < sl) ---> while (offset >= 0 && offset < sl)

      However the best is to invest in C2 optimization to handle all such cases.

      The following benchmark demonstrates good improvement:

      @State(Scope.Thread)
      @BenchmarkMode(Mode.AverageTime)
      @OutputTimeUnit(TimeUnit.NANOSECONDS)
      public class StringConstructorBenchmark {
        private byte[] array;

        @Setup
        public void setup() {
          String str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. Я"; // Latin1 ending with Russian
          array = str.getBytes(StandardCharsets.UTF_8);
        }

        @Benchmark
        public String newString() {
            return new String(array, 0, array.length, StandardCharsets.UTF_8);
        }
      }

      //baseline
      Benchmark Mode Cnt Score Error Units
      StringConstructorBenchmark.newString avgt 50 173,092 ± 3,048 ns/op

      //patched
      Benchmark Mode Cnt Score Error Units
      StringConstructorBenchmark.newString avgt 50 126,908 ± 2,355 ns/op

      The same is observed in String.translateEscapes() for the same String as in the benchmark above:

      //baseline
      Benchmark Mode Cnt Score Error Units
      StringConstructorBenchmark.translateEscapes avgt 100 53,627 ± 0,850 ns/op

      //patched
      Benchmark Mode Cnt Score Error Units
      StringConstructorBenchmark.translateEscapes avgt 100 48,087 ± 1,129 ns/op

      Attachments

        Issue Links

          Activity

            People

              roland Roland Westrelin
              stsypanov Sergey Tsypanov
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: