-
Bug
-
Resolution: Fixed
-
P3
-
None
-
None
The behavior of the FFM API is inconsistent when it comes to strings. There are few functions in the FFM API which manipulates user-defined strings:
* MemorySegment::getUtf8String
* MemorySegment::setUtf8String
* SegmentAllocator::allocateUtf8String
* SymbolLookup::libraryLookup(String)
* SymbolLookup::find(String)
The first three cases have to do with converting a native string into a Java string and back. These methods can only support single-byte encoding (otherwise strlen doesn't work). Other frameworks, such as JNR and JNA seem to provide more general conversion methods, but the essence of the problem is the same, as this bug demonstrates:
https://github.com/java-native-access/jna/issues/759
This problem seems also to be inherited by - see the JNU_NewStringPlatform, which computes the size of a string in a char* using strlen (again, this seems to assume single byte encoding).
As for symbol lookups, the issues are more obscure. When the a library name string is passed to dlopen (SymbolLookup::libraryLookup), we convert it into a char* using the GetStringPlatformChars function. This copies the string array into a new char buffer and appends a NUL terminator. There seem to be reports that this behavior is not always correct, depending on the contents of the string (see https://bugs.openjdk.org/browse/JDK-8195129).
Finally, the string passed to dlsym (SymbolLookup::find) is converted using yet another JNI function, namely GetStringUTFChars, which returns the string encoded in the modified UTF-8 encoding used by the classfile format. This is probably a choice that has to do with the fact that dlsym was only used to lookup JNI methods, and the name of a JNI method has to be a valid Java method name, encoded in the classfile (so, using modified Utf-8).
All these choices seem rather ad-hoc and/or biased towards what made sense for JNI. We should take a look at this again, and see if some more principled option exists.
* MemorySegment::getUtf8String
* MemorySegment::setUtf8String
* SegmentAllocator::allocateUtf8String
* SymbolLookup::libraryLookup(String)
* SymbolLookup::find(String)
The first three cases have to do with converting a native string into a Java string and back. These methods can only support single-byte encoding (otherwise strlen doesn't work). Other frameworks, such as JNR and JNA seem to provide more general conversion methods, but the essence of the problem is the same, as this bug demonstrates:
https://github.com/java-native-access/jna/issues/759
This problem seems also to be inherited by - see the JNU_NewStringPlatform, which computes the size of a string in a char* using strlen (again, this seems to assume single byte encoding).
As for symbol lookups, the issues are more obscure. When the a library name string is passed to dlopen (SymbolLookup::libraryLookup), we convert it into a char* using the GetStringPlatformChars function. This copies the string array into a new char buffer and appends a NUL terminator. There seem to be reports that this behavior is not always correct, depending on the contents of the string (see https://bugs.openjdk.org/browse/JDK-8195129).
Finally, the string passed to dlsym (SymbolLookup::find) is converted using yet another JNI function, namely GetStringUTFChars, which returns the string encoded in the modified UTF-8 encoding used by the classfile format. This is probably a choice that has to do with the fact that dlsym was only used to lookup JNI methods, and the name of a JNI method has to be a valid Java method name, encoded in the classfile (so, using modified Utf-8).
All these choices seem rather ad-hoc and/or biased towards what made sense for JNI. We should take a look at this again, and see if some more principled option exists.