Details
-
Enhancement
-
Resolution: Not an Issue
-
P3
-
15
Description
Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:
static final int ELEM_SIZE = 1_000_000;
static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;
static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());
@Benchmark
public void segment_loop() {
try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
for (int i = 0; i < ELEM_SIZE; i++) {
MemoryAddress address = segment.baseAddress();
if (i % 2 == 0) {
VH_int.set(address, (long)i, i + 1);
} else {
VH_int.set(address, (long)i, i - 1);
}
}
}
}
This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:
@Benchmark
public void segment_loop() {
try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
for (int i = 0; i < ELEM_SIZE; i++) {
if (i % 2 == 0) {
VH_int.set(segment.baseAddress(), (long)i, i + 1);
} else {
VH_int.set(segment.baseAddress(), (long)i, i - 1);
}
}
}
}
The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.
static final int ELEM_SIZE = 1_000_000;
static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;
static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());
@Benchmark
public void segment_loop() {
try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
for (int i = 0; i < ELEM_SIZE; i++) {
MemoryAddress address = segment.baseAddress();
if (i % 2 == 0) {
VH_int.set(address, (long)i, i + 1);
} else {
VH_int.set(address, (long)i, i - 1);
}
}
}
}
This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:
@Benchmark
public void segment_loop() {
try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
for (int i = 0; i < ELEM_SIZE; i++) {
if (i % 2 == 0) {
VH_int.set(segment.baseAddress(), (long)i, i + 1);
} else {
VH_int.set(segment.baseAddress(), (long)i, i - 1);
}
}
}
}
The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.
Attachments
Issue Links
- relates to
-
JDK-8237082 Workaround C2 limitations when working with long loops
- Closed