Loading...

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 25
Affects Version/s: 25
Component/s: core-libs
Labels:
- release-note=yes

Subcomponent:
java.lang
Resolved In Build:
b22

Summary

I hereby propose adding getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) method to CharSequence interface, providing a bulk-read facility including a default implementation iterating over charAt(int).

Introduction

For performance reasons, we recently integrated ~~JDK-8341566~~, providing the new Reader.of(CharSequence) factory method for non-synchronized reading of character sequences. In discussions surrounding this new API, people suggested adding getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) to CharSequence to simplify implementation of Reader.of(CharSequence), and to support bulk-reading of unknown/future implementations of CharSequence for performance reasons.

See also RFC on core-libs-dev mailing list (see https://mail.openjdk.org/pipermail/core-libs-dev/2024-October/132828.html).

Problem

For performance reasons, many CharSequence implementations, in particular String, StringBuilder, StringBuffer and CharBuffer, provide a way to bulk-read a complete region of their characters content into a provided char array. Unfortunately, there is no _uniform_ way to perform this, and it is not guaranteed that there is bulk-reading implemented with _any_ CharSequence, in particular custom ones. While String, StringBuilder and StringBuffer all share the same getChars(...) method signature for this purpose, CharBuffer's way to perform the very same is the get(...) method. Other implementations have other method signatures, or do not have _any_ solution for this problem at all. In particular, there is no method in their _common_ interface, CharSequence, to perform such a bulk-optimized read, as CharSequence only allows to read one character after the next in a sequential way, either by iterating over charAt(int), or by consuming the chars() Stream.

As a result, code that wants to read from CharSequence in an implementation-agnostic, but still bulk-optimized way, needs to know _each_ possible implementation's specific method! Effectively this results in code like this (real-world example taken from the implementation of Reader.of(CharSequence) in JDK 24):

switch (cs) {
case String s -> s.getChars(next, next + n, cbuf, off);
case StringBuilder sb -> sb.getChars(next, next + n, cbuf, off);
case StringBuffer sb -> sb.getChars(next, next + n, cbuf, off);
case CharBuffer cb -> cb.get(next, cbuf, off, n);
default -> {
for (int i = 0; i < n; i++)
cbuf[off + i] = cs.charAt(next + i);
}
}

The problem with this code is that it is bound and limited to exactly that given set of CharSequence implementations. If a future CharSequence implementation shall get accessed in a bulk-optimized way, the switch expression has to get extended and recompiled _every time_. If some custom CharSequence implementation is used that this code is not aware of, sequential read is applied, even if that implementation _does_ provide some bulk-read method!

Solution

There are several possible alternative solutions:
* (A) CharSequence.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) - As this signature is already supported by String, StringBuffer and StringBuilder, I hereby propose to add this signature to CharSequence and provide a default implementation that iterates over charAt(int).
* (B) Alternatively the same default method could be implemented using the chars() Stream - I assume that might run slower, but correct me if I am wrong.
* (C) Alternatively we could go with the signature get(char[] dst, int offset, int length) - Only CharBuffer implements that already, so more changes are needed and more duplicate methods will exist in the end.
* (D) Alternatively we could come up with a totally different signature - That would be most fair to all existing implementations, but in the end it will imply the most changes and the most duplicate methods.
* (E) We could give up the idea and live with the situation as-is. - I assume only few people really prefer that outcome.

Specification

The actual specification of the proposed new method "CharSequence.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)" is found in the accompanying Github Pull Request, as it is easier to discuss with the actual code change at hand.

causes

JDK-8361299 (bf) CharBuffer.getChars(int,int,char[],int) violates pre-existing specification

Resolved

csr for

JDK-8343111 Add getChars(int, int, char[], int) to CharSequence and CharBuffer

Closed

relates to

JDK-8357286 (bf) Remove obsolete instanceof checks in CharBuffer.append

Resolved

JDK-8356679 Using CharSequence::getChars internally

Open

links to

Commit(master) openjdk/jdk/7642556a

Review(master) openjdk/jdk/21730

(1 links to)

There are no Sub-Tasks for this issue.

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates