The performance of load and store operations from a heap `MemorySegment` backed by various array types varies substantially with the Vector type.
Generally, load and store operations on vector type X are performant if performed on a `MemorySegment` backed by an array of X or a byte array.
Here are some benchmark results from the micro-benchmark class `TestLoadSegmentVarious`.
```
Benchmark (size) Mode Cnt Score Error Units
TestLoadSegmentVarious.byteVectorFromByteBackedSegment 1024 avgt 10 280.008 ? 7.251 ns/op
TestLoadSegmentVarious.byteVectorFromDoubleBackedSegment 1024 avgt 10 1304.008 ? 98.901 ns/op
TestLoadSegmentVarious.byteVectorFromIntBackedSegment 1024 avgt 10 1279.621 ? 100.008 ns/op
TestLoadSegmentVarious.doubleVectorFromByteBackedSegment 1024 avgt 10 37.281 ? 1.360 ns/op
TestLoadSegmentVarious.doubleVectorFromDoubleBackedSegment 1024 avgt 10 36.847 ? 0.130 ns/op
TestLoadSegmentVarious.doubleVectorFromIntBackedSegment 1024 avgt 10 194.195 ? 31.096 ns/op
TestLoadSegmentVarious.intVectorFromByteBackedSegment 1024 avgt 10 72.602 ? 1.768 ns/op
TestLoadSegmentVarious.intVectorFromDoubleBackedSegment 1024 avgt 10 166.851 ? 9.528 ns/op
TestLoadSegmentVarious.intVectorFromIntBackedSegment 1024 avgt 10 71.283 ? 0.507 ns/op
TestLoadSegmentVarious.scalarByteVectorFromByteSegment 1024 avgt 10 4790.084 ? 45.882 ns/op
TestLoadSegmentVarious.scalarByteVectorFromDoubleSegment 1024 avgt 10 4841.273 ? 291.962 ns/op
TestLoadSegmentVarious.scalarByteVectorFromIntSegment 1024 avgt 10 4794.028 ? 101.282 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromByteSegment 1024 avgt 10 1241.117 ? 11.603 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromDoubleSegment 1024 avgt 10 1245.752 ? 15.516 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromIntSegment 1024 avgt 10 1232.216 ? 8.365 ns/op
TestLoadSegmentVarious.scalarIntVectorFromByteSegment 1024 avgt 10 1239.146 ? 14.582 ns/op
TestLoadSegmentVarious.scalarIntVectorFromDoubleSegment 1024 avgt 10 1236.712 ? 8.063 ns/op
TestLoadSegmentVarious.scalarIntVectorFromIntSegment 1024 avgt 10 1228.656 ? 3.329 ns/op
```
A log of methods being/not being intrinsified can be obtained by running the test `IntrinsicHeapTest` and observing the standard output. Excerpt:
```
** not supported: arity=0 op=load vlen=1 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
** not supported: arity=0 op=load vlen=4 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
** not supported: arity=0 op=load vlen=8 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
```
Generally, load and store operations on vector type X are performant if performed on a `MemorySegment` backed by an array of X or a byte array.
Here are some benchmark results from the micro-benchmark class `TestLoadSegmentVarious`.
```
Benchmark (size) Mode Cnt Score Error Units
TestLoadSegmentVarious.byteVectorFromByteBackedSegment 1024 avgt 10 280.008 ? 7.251 ns/op
TestLoadSegmentVarious.byteVectorFromDoubleBackedSegment 1024 avgt 10 1304.008 ? 98.901 ns/op
TestLoadSegmentVarious.byteVectorFromIntBackedSegment 1024 avgt 10 1279.621 ? 100.008 ns/op
TestLoadSegmentVarious.doubleVectorFromByteBackedSegment 1024 avgt 10 37.281 ? 1.360 ns/op
TestLoadSegmentVarious.doubleVectorFromDoubleBackedSegment 1024 avgt 10 36.847 ? 0.130 ns/op
TestLoadSegmentVarious.doubleVectorFromIntBackedSegment 1024 avgt 10 194.195 ? 31.096 ns/op
TestLoadSegmentVarious.intVectorFromByteBackedSegment 1024 avgt 10 72.602 ? 1.768 ns/op
TestLoadSegmentVarious.intVectorFromDoubleBackedSegment 1024 avgt 10 166.851 ? 9.528 ns/op
TestLoadSegmentVarious.intVectorFromIntBackedSegment 1024 avgt 10 71.283 ? 0.507 ns/op
TestLoadSegmentVarious.scalarByteVectorFromByteSegment 1024 avgt 10 4790.084 ? 45.882 ns/op
TestLoadSegmentVarious.scalarByteVectorFromDoubleSegment 1024 avgt 10 4841.273 ? 291.962 ns/op
TestLoadSegmentVarious.scalarByteVectorFromIntSegment 1024 avgt 10 4794.028 ? 101.282 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromByteSegment 1024 avgt 10 1241.117 ? 11.603 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromDoubleSegment 1024 avgt 10 1245.752 ? 15.516 ns/op
TestLoadSegmentVarious.scalarDoubleVectorFromIntSegment 1024 avgt 10 1232.216 ? 8.365 ns/op
TestLoadSegmentVarious.scalarIntVectorFromByteSegment 1024 avgt 10 1239.146 ? 14.582 ns/op
TestLoadSegmentVarious.scalarIntVectorFromDoubleSegment 1024 avgt 10 1236.712 ? 8.063 ns/op
TestLoadSegmentVarious.scalarIntVectorFromIntSegment 1024 avgt 10 1228.656 ? 3.329 ns/op
```
A log of methods being/not being intrinsified can be obtained by running the test `IntrinsicHeapTest` and observing the standard output. Excerpt:
```
** not supported: arity=0 op=load vlen=1 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
** not supported: arity=0 op=load vlen=4 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
** not supported: arity=0 op=load vlen=8 etype=long ismask=no
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) failed to inline: failed to inline (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 28 jdk.internal.vm.vector.VectorSupport::load (38 bytes) (intrinsic)
@ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
```
- is blocked by
-
JDK-8318678 Vector access on heap MemorySegments only works for byte[]
- Closed
- relates to
-
JDK-8329555 Crash in intrinsifying heap-based MemorySegment Vector store/loads
- Resolved
- links to
-
Commit openjdk/jdk/2678e4cd
-
Review(master) openjdk/jdk/16888