Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8280124

Reduce branches decoding latin-1 chars from UTF-8 encoded bytes

XMLWordPrintable

    • b06

        String(byte[], int, int, Charset) constructor has this check for latin-1 in the latin-1 fast-path:

        if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) && ...

        Since the two constant bytes differ only on the lowest bit this can be transformed to this:

        if ((b1 & 0xfe) == 0xc2 &&

        Which makes the code less branchy and produce a small speed-up on a targetted microbenchmark:

        Benchmark (charsetName) Mode Cnt Score Error Units
        StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2283.591 ± 12.332 ns/op

        StringDecode.decodeLatin1LongStart UTF-8 avgt 50 2165.984 ± 13.136 ns/op

        (While this minor inefficiency appears to have been introduced in with JEP 254 in JDK 9, the performance of decoding latin-1 strings was much improved thanks to the compactness of latin-1 encoded Strings, so I've not seen a regression caused by this.)

              redestad Claes Redestad
              redestad Claes Redestad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: