Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P3
Fix Version/s: 17
Affects Version/s: None
Component/s: core-libs
Labels:
- performance

Subcomponent:
java.nio.charsets
Resolved In Build:
b11

Compact Strings work in JDK9 added intrinsic methods for the purposes of speeding up certain common String operations, including testing if a byte[] range has negative bytes, and inflating a byte[] to a char[]. By utilizing SIMD instructions, these intrinsic operations can execute much faster than a simple loop such as the one found in UTF_8.Decoder:

            // ASCII only loop
            while (dp < dlASCII && sa[sp] >= 0)
                da[dp++] = (char) sa[sp++];

However, the optimizations done by Compact Strings are not picked up in some places, such as in any of the built-in CharsetDecoders. On my Haswell-based workstation (AVX=2), converting a byte[] to a Strings via new String(new InputStreamReader(new ByteArrayInputStream(bytes)).read(chars)) can be 14x slower than doing new String(bytes) directly for UTF-8, and as much as 24x slower for US-ASCII. Obviously not something you'd do directly, but decoding character streams, e.g., when reading files or doing network I/O is a common occurrence.

This RFE suggests reusing these intrinsics for latin1 and ASCII-compatible CharsetDecoders, which can experimentally give substantial improvements by utilizing SIMD capabilities. I see a 4-5x speed-up on UTF-8 and ~10x on ISO-8859-1, reducing the overhead of going via an InputStreamReader down to a more reasonable level (~2.5x) across all charsets.

links to

Commit openjdk/jdk/433096a4

Review openjdk/jdk/2574

Assignee:: Claes Redestad

Reporter:: Claes Redestad

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2021-02-15 03:15

Updated:: 2025-01-16 12:18

Resolved:: 2021-02-19 07:06

Details

Description

Attachments

Issue Links

Activity

People

Dates