For storing certain kinds of native data in the Java heap we need a low-level "block data" object type, similar to an array of primitives, but with opaque storage and the option of alignment which is stronger than 64 bits.
An API like this would work:
/** Marker for block data. Non-public in jdk.internal.block. */
interface BlockData { }
/** Method in Unsafe or similar API. */
BlockData allocateBlockData(long sizeInBytes, int alignInBytes);
// align must be a power of two
/** Base address of a particular block, for Unsafe addressing. */
Object blockDataBase(BlockData block);
/** Data offset of a particular block, for Unsafe addressing. */
long blockDataOffset(BlockData block);
/** Original sizeInBytes of a particular block, for range checks. */
long blockDataSizeInBytes(BlockData block);
/** Original alignInBytes of a particular block, for alignment checks. */
int blockDataAlignInBytes(BlockData block);
The JVM or runtime may create multiple classes to implement blocks.
Perhaps some sizes or alignments will be placed off-heap.
No integration with the Java language or public APIs is necessary. The type of a block data object does not need to be advertised in any public API. It is a building block for wrapper objects which hide block data in their internals.
The current workaround is to store block data in Java long arrays, when alignment requirements are 8 bytes or less. For hyper-aligned objects, and for objects larger the 2 billion longs, off-heap storage is required. Off-heap storage makes deallocation much more difficult.
Alignment requests of 8 bytes or less are trivial to satisfy since the JVM already knows how to pad out an object to get up to 8 bytes alignment for its fields. Larger "hyperalignment" requests probably require the GC to insert dynamically calculated padding objects before hyperaligned blocks, or to place such blocks in special regions where all objects are aligned, or (in some cases of large size or alignment) to move the storage out of the heap, leaving just a "long" pointer in the heap object. (The GC would be responsible for calling "free" when the object goes out of scope, or perhaps a reference queue could do this.)
Use cases for such data blocks include:
- Small objects containing one or more aligned vectors (128 to 2048 bits in size).
- Large Fortran-like arrays.
- Large Fortran-like arrays of vectors.
- Images of large objects such as mapped files.
The small and large use cases are different enough that they could be handled with different techniques. Note also that the hyperalignment requirement affects small objects differently from large ones.
There may be another use case for associating page-aligned block data objects with OS-level mappings, such as mapped files and shared memory segments. In that case, provision would have to be made for closing the OS mapping when the object goes dead (perhaps via a reference queue), and for keeping the OS mapping pointed at the object (if the GC moves it) or not moving it at all.
An API like this would work:
/** Marker for block data. Non-public in jdk.internal.block. */
interface BlockData { }
/** Method in Unsafe or similar API. */
BlockData allocateBlockData(long sizeInBytes, int alignInBytes);
// align must be a power of two
/** Base address of a particular block, for Unsafe addressing. */
Object blockDataBase(BlockData block);
/** Data offset of a particular block, for Unsafe addressing. */
long blockDataOffset(BlockData block);
/** Original sizeInBytes of a particular block, for range checks. */
long blockDataSizeInBytes(BlockData block);
/** Original alignInBytes of a particular block, for alignment checks. */
int blockDataAlignInBytes(BlockData block);
The JVM or runtime may create multiple classes to implement blocks.
Perhaps some sizes or alignments will be placed off-heap.
No integration with the Java language or public APIs is necessary. The type of a block data object does not need to be advertised in any public API. It is a building block for wrapper objects which hide block data in their internals.
The current workaround is to store block data in Java long arrays, when alignment requirements are 8 bytes or less. For hyper-aligned objects, and for objects larger the 2 billion longs, off-heap storage is required. Off-heap storage makes deallocation much more difficult.
Alignment requests of 8 bytes or less are trivial to satisfy since the JVM already knows how to pad out an object to get up to 8 bytes alignment for its fields. Larger "hyperalignment" requests probably require the GC to insert dynamically calculated padding objects before hyperaligned blocks, or to place such blocks in special regions where all objects are aligned, or (in some cases of large size or alignment) to move the storage out of the heap, leaving just a "long" pointer in the heap object. (The GC would be responsible for calling "free" when the object goes out of scope, or perhaps a reference queue could do this.)
Use cases for such data blocks include:
- Small objects containing one or more aligned vectors (128 to 2048 bits in size).
- Large Fortran-like arrays.
- Large Fortran-like arrays of vectors.
- Images of large objects such as mapped files.
The small and large use cases are different enough that they could be handled with different techniques. Note also that the hyperalignment requirement affects small objects differently from large ones.
There may be another use case for associating page-aligned block data objects with OS-level mappings, such as mapped files and shared memory segments. In that case, provision would have to be made for closing the OS mapping when the object goes dead (perhaps via a reference queue), and for keeping the OS mapping pointed at the object (if the GC moves it) or not moving it at all.