Loading...

Type: CSR
Resolution: Unresolved
Priority: P4
Fix Version/s: 27
Component/s: core-libs
Labels:
None

Subcomponent:
java.lang.foreign
Compatibility Kind:

behavioral
Compatibility Risk:
minimal
Compatibility Risk Description:
This is a new method, as such it doesn't affect existing clients.
Interface Kind:

Java API
Scope:
SE

Summary

Introduce new methods to support more efficient interoperability between strings and memory segments.

Problem

The existing FFM methods to read and writes strings to and from memory segments, as well as to allocate memory segments from existing Java strings, assume strings are zero-terminated.

There are cases where clients would like to read strings without having to look for a terminator (as they already know the size), or where they would like to write a Java string (or a portion of it) onto some destination memory segment.

For more background, see Maurizio's document Pulling the (foreign) string.

Solution

This change adds threw new methods to support efficient handling of non-null terminated strings:

MemorySegment#getString(long offset, Charset charset, long length)
MemorySegment#copy(String src, Charset dstEncoding, int srcIndex, MemorySegment dst, long dstOffset, int numChars)
SegmentAllocator#allocateFrom(String str, Charset charset, int srcIndex, int numChars)

For background, see Pulling the (foreign) string and this panama-dev@ thread.

Specification

diff --git a/src/java.base/share/classes/java/lang/foreign/MemorySegment.java b/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
index 196f44d1abe..378e9f479a0 100644
--- a/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
+++ b/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
@@ -1296,12 +1296,7 @@ MemorySegment reinterpret(long newSize,
      * over the decoding process is required.
      * <p>
      * Getting a string from a segment with a known byte offset and
-     * known byte length can be done like so:
-     * {@snippet lang=java :
-     *     byte[] bytes = new byte[length];
-     *     MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
-     *     return new String(bytes, charset);
-     * }
+     * known byte length can be done using {@link #getString(long, Charset, long)}.
      *
      * @param offset  offset in bytes (relative to this segment address) at which this
      *                access operation will occur
@@ -1328,6 +1323,34 @@ MemorySegment reinterpret(long newSize,
      */
     String getString(long offset, Charset charset);
 
+    /**
+     * Reads a string from this segment at the given offset, using the provided length
+     * and charset.
+     * <p>
+     * This method always replaces malformed-input and unmappable-character
+     * sequences with this charset's default replacement string. The {@link
+     * java.nio.charset.CharsetDecoder} class should be used when more control
+     * over the decoding process is required.
+     *
+     * @param offset  offset in bytes (relative to this segment address) at which this
+     *                access operation will occur
+     * @param charset the charset used to {@linkplain Charset#newDecoder() decode} the
+     *                string bytes
+     * @param length  length in bytes of the string to read
+     * @return a Java string constructed from the bytes read from the given starting
+     *         address reading the given length of bytes
+     * @throws IllegalArgumentException  if the size of the string is greater than the
+     *         largest string supported by the platform
+     * @throws IndexOutOfBoundsException if {@code offset < 0}
+     * @throws IndexOutOfBoundsException if {@code offset > byteSize() - length}
+     * @throws IllegalStateException if the {@linkplain #scope() scope} associated with
+     *         this segment is not {@linkplain Scope#isAlive() alive}
+     * @throws WrongThreadException if this method is called from a thread {@code T},
+     *         such that {@code isAccessibleBy(T) == false}
+     * @throws IllegalArgumentException if {@code length < 0}
+     */
+    String getString(long offset, Charset charset, long length);
+
     /**
      * Writes the given string into this segment at the given offset, converting it to
      * a null-terminated byte sequence using the {@linkplain StandardCharsets#UTF_8 UTF-8}
@@ -2606,6 +2629,48 @@ static void copy(Object srcArray, int srcIndex,
                 elementCount);
     }
 
+    /**
+     * Copies the byte sequence of the given string encoded using the provided charset
+     * to the destination segment.
+     * <p>
+     * This method always replaces malformed-input and unmappable-character
+     * sequences with this charset's default replacement string. The {@link
+     * java.nio.charset.CharsetDecoder} class should be used when more control
+     * over the decoding process is required.
+     * <p>
+     * If the given string contains any {@code '\0'} characters, they will be
+     * copied as well. This means that, depending on the method used to read
+     * the string, such as {@link MemorySegment#getString(long)}, the string
+     * will appear truncated when read again.
+     *
+     * @param src      the Java string to be written into this segment
+     * @param dstEncoding the charset used to {@linkplain Charset#newEncoder() encode}
+     *                 the string bytes.
+     * @param srcIndex the starting index of the source string
+     * @param dst      the destination segment
+     * @param dstOffset the starting offset, in bytes, of the destination segment
+     * @param numChars the number of characters to be copied
+     * @throws IllegalStateException if the {@linkplain #scope() scope} associated with
+     *         {@code dst} is not {@linkplain Scope#isAlive() alive}
+     * @throws WrongThreadException if this method is called from a thread {@code T},
+     *         such that {@code dst.isAccessibleBy(T) == false}
+     * @throws IndexOutOfBoundsException if either {@code srcIndex}, {@code numChars}, or {@code dstOffset}
+     *         are {@code < 0}
+     * @throws IndexOutOfBoundsException if the {@code numChars + srcIndex} is larger than the length of
+     *         this {@code String} object.
+     * @throws IllegalArgumentException if {@code dst} is {@linkplain #isReadOnly() read-only}
+     * @throws IndexOutOfBoundsException if {@code dstOffset > dstSegment.byteSize() - B} where {@code B} is the size,
+     *         in bytes, of the string encoded using the given charset.
+     */
+    @ForceInline
+    static void copy(String src, Charset dstEncoding, int srcIndex, MemorySegment dst, long dstOffset, int numChars) {
+        Objects.requireNonNull(src);
+        Objects.requireNonNull(dstEncoding);
+        Objects.requireNonNull(dst);
+
+        AbstractMemorySegmentImpl.copy(src, dstEncoding, srcIndex, dst, dstOffset, numChars);
+    }
+
     /**
      * Finds and returns the relative offset, in bytes, of the first mismatch between the
      * source and the destination segments. More specifically, the bytes at offset
diff --git a/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java b/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
index 1297406dcf1..6d36e265220 100644
--- a/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
+++ b/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
@@ -137,10 +137,10 @@ default MemorySegment allocateFrom(String str, Charset charset) {
         int termCharSize = StringSupport.CharsetKind.of(charset).terminatorCharSize();
         MemorySegment segment;
         int length;
-        if (StringSupport.bytesCompatible(str, charset)) {
+        if (StringSupport.bytesCompatible(str, charset, 0, str.length())) {
             length = str.length();
             segment = allocateNoInit((long) length + termCharSize);
-            StringSupport.copyToSegmentRaw(str, segment, 0);
+            StringSupport.copyToSegmentRaw(str, segment, 0, 0, str.length());
         } else {
             byte[] bytes = str.getBytes(charset);
             length = bytes.length;
@@ -153,6 +153,52 @@ default MemorySegment allocateFrom(String str, Charset charset) {
         return segment;
     }
 
+    /**
+     * Converts a Java string into a C string using the provided charset,
+     * and storing the result into a memory segment.
+     * <p>
+     * This method always replaces malformed-input and unmappable-character
+     * sequences with this charset's default replacement byte array. The
+     * {@link java.nio.charset.CharsetEncoder} class should be used when more
+     * control over the encoding process is required.
+     * <p>
+     * If the given string contains any {@code '\0'} characters, they will be
+     * copied as well. This means that, depending on the method used to read
+     * the string, such as {@link MemorySegment#getString(long)}, the string
+     * will appear truncated when read again.
+     *
+     * @param str      the Java string to be converted into a C string
+     * @param charset  the charset used to {@linkplain Charset#newEncoder() encode} the
+     *                 string bytes
+     * @param srcIndex the starting index of the source string
+     * @param numChars the number of characters to be copied
+     * @return a new native segment containing the converted C string
+     * @throws IndexOutOfBoundsException if either {@code srcIndex} or {@code numChars} are {@code < 0}
+     * @throws IndexOutOfBoundsException if the {@code numChars + srcIndex} is larger than the length of
+     *         this {@code String} object.
+     *
+     * @implSpec The default implementation for this method copies the contents of the
+     *           provided Java string into a new memory segment obtained by calling
+     *           {@code this.allocate(B)}, where {@code B} is the size, in bytes, of
+     *           the string encoded using the provided charset
+     *           (e.g. {@code str.getBytes(charset).length});
+     */
+    @ForceInline
+    default MemorySegment allocateFrom(String str, Charset charset, int srcIndex, int numChars) {
+        Objects.requireNonNull(charset);
+        Objects.requireNonNull(str);
+        MemorySegment segment;
+        if (StringSupport.bytesCompatible(str, charset, srcIndex, numChars)) {
+            segment = allocateNoInit(numChars);
+            StringSupport.copyToSegmentRaw(str, segment, 0, srcIndex, numChars);
+        } else {
+            byte[] bytes = str.substring(srcIndex, srcIndex + numChars).getBytes(charset);
+            segment = allocateNoInit(bytes.length);
+            MemorySegment.copy(bytes, 0, segment, ValueLayout.JAVA_BYTE, 0, bytes.length);
+        }
+        return segment;
+    }
+
     /**
      * {@return a new memory segment initialized with the provided byte value}
      * <p>

csr of

JDK-8369564 Provide a MemorySegment API to read strings with known lengths

New

Details

Description

Summary

Problem

Solution

Specification

Attachments

Issue Links

Activity

People

Dates