Loading...

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 17
Affects Version/s: 17
Component/s: core-libs
Labels:

Subcomponent:
java.nio
Resolved In Build:
b21
CPU:

x86_64
OS:

windows_10

ADDITIONAL SYSTEM INFORMATION :
Java 17-ea+13 (and Java 15+36), Windows 10 x64

A DESCRIPTION OF THE PROBLEM :
The task is to serialize an array of floats to a byte array as fast as possible (this time with byte order = BIG_ENDIAN), see https://www.reddit.com/r/java/comments/m4b9f6/ for the full context. The fastest available options are using a ByteBuffer and/or a VarHandle. However, one specific shape is unexpectedly slow:

```java
// Fast! Good!
@Benchmark
public byte[] byteBufferBigEndian() {
    ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize);
    byteBuffer.asFloatBuffer().put(floats);
    return byteBuffer.array();
}

// Slow!
@Benchmark
public byte[] byteBufferBigEndianSwapMemoryCopy() {
    ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize);
    // The wrap() forces usage of Unsafe.swapCopyMemory() which is twice as slow as the other variant:
    byteBuffer.asFloatBuffer().put(FloatBuffer.wrap(floats));
    return byteBuffer.array();
}
```

The problem is that even though the more natural approach to call `.put(Xarray)` is fast, an alternative `put(FloatBuffer.wrap(floats))` is much slower because the latter uses Unsafe.swapCopyMemory under the hood which, on my system, is much worse than the alternative even though the source is the exact same array of floats.

```java
// Unusable because it's a preview feature.
@Benchmark
public byte[] memorySegment() {
    try (MemorySegment segment = MemorySegment.ofArray(floats)) {
        return segment.toByteArray();
    }
}

// Slow!
@Benchmark
public byte[] byteBufferNativeOrder() {
    ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize).order(ByteOrder.nativeOrder());
    byteBuffer.asFloatBuffer().put(floats);
    return byteBuffer.array();
}

// Fast!
@Benchmark
public byte[] byteBufferNativeOrderMemoryCopy() {
    ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize).order(ByteOrder.nativeOrder());
    // The wrap() forces usage of Unsafe.copyMemory() which is twice as fast as the other variant:
    byteBuffer.asFloatBuffer().put(FloatBuffer.wrap(floats));
    return byteBuffer.array();
}
```

The problem is that the more natural approach to call `.put(Xarray)` is much slower than the less obvious alternative `put(FloatBuffer.wrap(floats))`. because the former is missing a bulk approach while the latter uses Unsafe.copyMemory under the hood.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run https://gitlab.com/janecekpetr/benchmarks/-/blob/master/src/main/java/com/gitlab/janecekpetr/benchmarks/FloatSerializationBenchmark.java
by
1. cloning the repo
2. mvn verify
3. java -jar target/benchmarks.jar FloatSerializationBenchmark

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect byteBuffer.asFloatBuffer().put(FloatBuffer.wrap(floats)) to perform just as fast as byteBuffer.asFloatBuffer().put(floats).
ACTUAL -
Benchmark (size) Mode Cnt Score Error Units
byteBufferBigEndian 2048 thrpt 5 800522,219 ± 69499,093 ops/s
byteBufferBigEndianSwapMemoryCopy 2048 thrpt 5 371907,046 ± 9270,812 ops/s
byteBufferNativeOrder 2048 thrpt 5 756516,722 ± 33633,399 ops/s
byteBufferNativeOrderMemoryCopy 2048 thrpt 5 1208847,781 ± 67935,938 ops/s
dataOutputStream 2048 thrpt 5 99949,822 ± 17233,752 ops/s
kryoLikeUnsafe 2048 thrpt 5 1248879,311 ± 26843,663 ops/s
manualUnpacking 2048 thrpt 5 181612,250 ± 21232,457 ops/s
objectOutputStream 2048 thrpt 5 102348,095 ± 4135,803 ops/s
varHandleBigEndian 2048 thrpt 5 726448,503 ± 13138,903 ops/s
varHandleNativeOrder 2048 thrpt 5 698638,620 ± 20742,939 ops/s

---------- BEGIN SOURCE ----------
@Fork(1)
@Warmup(iterations = 3, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 6, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread)
public class FloatSerializationBenchmark {

    @Param({/*"8", "32", "128", "512",*/ "2048"})
    private int size;
    private float[] floats;
    private int byteSize;

    @Setup
    public void setup() {
        floats = new float[size];
        ThreadLocalRandom random = ThreadLocalRandom.current();
        for (int i = 0; i < floats.length; i++) {
            floats[i] = random.nextFloat();
        }

        byteSize = size * Float.BYTES;
    }

    @Benchmark
    public byte[] byteBufferBigEndian() {
        ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize).order(ByteOrder.BIG_ENDIAN);
        byteBuffer.asFloatBuffer().put(floats);
        return byteBuffer.array();
    }

    @Benchmark
    public byte[] byteBufferBigEndianSwapMemoryCopy() {
        ByteBuffer byteBuffer = ByteBuffer.allocate(byteSize).order(ByteOrder.BIG_ENDIAN);
        // The wrap() forces usage of Unsafe.swapCopyMemory() which is twice as slow as the other variant:
        byteBuffer.asFloatBuffer().put(FloatBuffer.wrap(floats));
        return byteBuffer.array();
    }

}
---------- END SOURCE ----------

FREQUENCY : always

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Main.java
7 kB
2021-04-06 03:44

links to

Commit openjdk/jdk/6bb71d9e

Review openjdk/jdk/3660

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates