Summary
Introduce a new method to compute the byte length of a String encoded in a given Charset.
Problem
It is sometimes necessary to compute the byte length of a String encoded in a particular charset. One motivating use-case is encoding multiple large strings into a single array. Without an efficient way to get the encoded length, it's necessary to encode into a temporary array and pay the cost of resizing it (potentially multiple times).
Using getBytes(cs).length is correct but inefficient, as it creates an intermediate array.
Solution
Computing the encoded length without allocating is possible to do with a non-JDK library method, but a JDK implementation could be more efficient. The JDK can optimize by using internal knowledge of the string representation. For certain combinations of string representations and charsets the JDK can compute the encoded length in constant time, for example if the string data is ASCII and the target charset is UTF-8. The JDK can also use intrinsics for some string operations.
The proposed solution adds the method java.lang.String#getBytesLength(Charset).
Specification
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
...
+ /**
+ * {@return the length in bytes of the given String encoded with the given {@link Charset}}
+ *
+ * <p>The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}.
+ *
+ * @apiNote This method provides equivalent or better performance than {@link #getBytes(Charset)
+ * getBytes(cs).length}. It may allocate memory to compute the length for some charsets.
+ *
+ * @param cs The {@link Charset} used to the compute the length
+ * @since 27
+ */
+ public int getBytesLength(Charset cs) {
- csr of
-
JDK-8372353 API to compute the byte length of a String encoded in a given Charset
-
- New
-