Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8289693

MemorySegment.allocateUtf8String(String str) should be clarified for strings containing \0

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Withdrawn
    • Icon: P3 P3
    • None
    • core-libs
    • None
    • behavioral
    • minimal
    • Change to a preview API. The change also addresses a corner case for which the current behavior seems unlikely to be relied upon.
    • Java API

      Summary

      Amend the documentation of SegmentAllocator::allocateUtf8String to describe what happens if the argument string contain null/0 bytes.

      Problem

      If a Java string containing null/0 characters is converted to a C string, it might appear truncated depending on format expected by the native code, since a null/0 character can also indicate the terminator of the string.

      Similarly, when reading a string in Java through MemorySegment/MemoryAddress.getUtf8String, we treat a \0/null character as the terminator.

      Solution

      Amend the documentation to describe this behavior explicitly.

      Specification

      The javadoc has the following diff:

      diff --git a/src/java.base/share/classes/java/lang/foreign/MemorySegment.java b/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
      index 3b29756fb23..f2f9dd973ce 100644
      --- a/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
      +++ b/src/java.base/share/classes/java/lang/foreign/MemorySegment.java
      @@ -737,6 +737,12 @@ default String getUtf8String(long offset) {
            * sequences with this charset's default replacement string.  The {@link
            * java.nio.charset.CharsetDecoder} class should be used when more control
            * over the decoding process is required.
      +     * <p>
      +     * If the given string contains any {@code '\0'} characters, they will be
      +     * copied as well. This means that, depending on the method used to read
      +     * the string, such as {@link MemorySegment#getUtf8String(long)}, the string
      +     * will appear truncated when read again.
      +     *
            * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@linkplain #isNative() native} segment,
            *               the final address of this write operation can be expressed as {@code address().toRowLongValue() + offset}.
            * @param str the Java string to be written into this segment.
      diff --git a/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java b/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
      index 095f360e97e..6687936d48c 100644
      --- a/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
      +++ b/src/java.base/share/classes/java/lang/foreign/SegmentAllocator.java
      @@ -71,6 +71,11 @@ public interface SegmentAllocator {
            * sequences with this charset's default replacement byte array.  The
            * {@link java.nio.charset.CharsetEncoder} class should be used when more
            * control over the encoding process is required.
      +     * <p>
      +     * If the given string contains any {@code '\0'} characters, they will be
      +     * copied as well. This means that, depending on the method used to read
      +     * the string, such as {@link MemorySegment#getUtf8String(long)}, the string
      +     * will appear truncated when read again.
            *
            * @implSpec the default implementation for this method copies the contents of the provided Java string
            * into a new memory segment obtained by calling {@code this.allocate(str.length() + 1)}.
      

            jvernee Jorn Vernee
            lkuskov Leonid Kuskov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: