-
Enhancement
-
Resolution: Unresolved
-
P4
-
16
-
x86, x86_64, aarch64
-
generic
UTF-16 is the default encoding of strings in Java (whenever they can't be compact strings), but the most common encoding out there is UTF-8. It is then common for a Java application to frequently convert between these two encodings.
The current implementation of UTF-8 / UTF-16 conversion in `sun.nio.cs.UTF_8` can be accelerated with vectorization (see "A Case Study in SIMD Text Processing with Parallel Bit Streams, UTF-8 to UTF-16 Transcoding", https://dl.acm.org/doi/pdf/10.1145/1345206.1345222 for an academic reference). We expect such acceleration to be helpful to any application encountering UTF-8 texts.
The current implementation of UTF-8 / UTF-16 conversion in `sun.nio.cs.UTF_8` can be accelerated with vectorization (see "A Case Study in SIMD Text Processing with Parallel Bit Streams, UTF-8 to UTF-16 Transcoding", https://dl.acm.org/doi/pdf/10.1145/1345206.1345222 for an academic reference). We expect such acceleration to be helpful to any application encountering UTF-8 texts.