Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8180628

(bf) Retrofit direct buffer support for size beyond gigabyte scales



    • Type: Enhancement
    • Status: Resolved
    • Priority: P2
    • Resolution: Won't Fix
    • Affects Version/s: 10
    • Fix Version/s: tbd
    • Component/s: core-libs
    • Labels:


      Direct buffers, like Java native arrays, are limited in size by the dynamic range of the Java `int` type.

      Many users have a need for a Java handle to a native data block (especially a DirectByteBuffer) that may exceed 2Gb in size (the limit of what an `int` can index).

      Future alternatives to DB's may include 1. a new replacement for DB that uses `long` instead of `int`, 2. retrofitting DB's to a specializable generic type whose index type is a type parameter that can assume both `int` and `long` (and, as a bonus, other types such as 2D mesh coordinates), and 3. relying on forthcoming Project Panama types like MemoryRegion.

      Still, it is worth considering a retrofit of today's buffer types that forces them to process data under long indexes, even though this does some violence to the current API contract.

      By using method overloading, we can define methods which take a `long` index wherever an `int` index or size is currently taken. Return values cannot (alas) be lengthened by overloading tricks, but a convention can be used for new API points which deliver index or size values.

      Here are suggested API points which would make a retrofit work:

      public abstract static class MappedByteBuffer extends ByteBuffer {
        public final MappedByteBuffer position(long i);
        public final MappedByteBuffer limit(long i);
      public abstract static class ByteBuffer extends Buffer implements Comparable<ByteBuffer> {
        public static ByteBuffer allocateDirect(long i);
        public static ByteBuffer allocate(long i);
        public abstract byte get(long i);
        public abstract ByteBuffer put(long i, byte x);
        public ByteBuffer position(long i);
        public ByteBuffer limit(long i);
        public final int alignmentOffset(long i, int j);
        public final ByteBuffer alignedSlice(long i);
        public abstract char getChar(long i);
        public abstract ByteBuffer putChar(long i, char x);
        public abstract short getShort(long i);
        public abstract ByteBuffer putShort(long i, short x);
        public abstract int getInt(long i);
        public abstract ByteBuffer putInt(long i, int j);
        public abstract long getLong(long i);
        public abstract ByteBuffer putLong(long i, long x);
        public abstract float getFloat(long i);
        public abstract ByteBuffer putFloat(long i, float x);
        public abstract double getDouble(long i);
        public abstract ByteBuffer putDouble(long i, double x);
      public abstract static class Buffer {
        public final long capacityAsLong();
        public final long positionAsLong();
        public Buffer position(long i);
        public final long limitAsLong();
        public Buffer limit(long i);
        public final long remainingAsLong();

      A buffer would have to record dynamically whether it was in `int` mode or `long` mode. This could be simply a function of the buffer's size, or an explicit creation parameter.

      If a buffer is in `long` mode, then calling one of the int mode query functions (instead of their `AsLong` siblings) would have to throw an error, at least if the value were outside of the dynamic range of an `int`.

      This retrofit is similar to the one that allowed Unix file systems to work with files greater than 2Gb in size. Many of the API points were unchanged, especially the streaming ones. API points had to be joined by new API points with wider index or size types.

      On-heap byte buffers could be given a similar treatment, by using a blocked-array data structure. The design of this is problematic, since for some applications a single block size is workable but the block size may vary from application to application, and for still others a variable block size is necessary.

      A better move would be to support scatter/gather directly, by allowing a third kind of byte buffer which logically clusters a group of subsidiary byte buffers. This kind of byte buffer (GatheredByteBuffer or ByteBufferGroup) would probably need an internal indexing structure to quickly map absolute indexes down to the selected member of the group. Important use cases are homogeneous power of two (easy to decode indexes) and the general (can use a binary search array). This aspect is developed in more detail in JDK-8181704.


          Issue Links



              bpb Brian Burkhalter
              jrose John Rose
              0 Vote for this issue
              15 Start watching this issue