-
Enhancement
-
Resolution: Unresolved
-
P4
-
9, 10
If you run a simple benchmark like this:
@Benchmark
public int stream_unsafe_char() {
int s = 0;
for (int c = 0; c < size; c++) {
s += U.getChar(charArr, CHAR_ARR_OFFSET + CHAR_ARR_SCALE * c);
}
return s;
}
...then you will notice that generated code has a stray "movslq":
0x00007ffad8741180: movslq %r8d,%r9 <--- unnecessary sign extension
0x00007ffad8741183: movzwl 0x10(%r10,%r9,2)
0x00007ffad8741189: add %ecx,%edx
0x00007ffad874118b: inc %r8d
0x00007ffad874118e: cmp %r11d,%r8d
0x00007ffad8741191: jl 0x00007ffad8741180
Indeed, the baseline test that does the plain Java access:
@Benchmark
public int plain() {
int s = 0;
for (int c = 0; c < size; c++) {
s += charArr[c];
}
return s;
}
...does without sign extension:
0x00007fc3d0f84ea0: movzwl 0x10(%r11,%r8,2),%ecx
0x00007fc3d0f84ea6: add %ecx,%edx
0x00007fc3d0f84ea8: inc %r8d
0x00007fc3d0f84eab: cmp %r10d,%r8d
0x00007fc3d0f84eae: jl 0x00007fc3d0f84ea0
This conversion costs some cycles on my 1x4x2 4 GHz Haswell, running with Linux x86_64, 8u40 EA:
Benchmark (size) Mode Cnt Score Error Units
UnsafeMovslq.plain 1000 avgt 5 343.719 ± 0.694 ns/op
UnsafeMovslq.unsafe 1000 avgt 5 362.350 ± 5.302 ns/op
@Benchmark
public int stream_unsafe_char() {
int s = 0;
for (int c = 0; c < size; c++) {
s += U.getChar(charArr, CHAR_ARR_OFFSET + CHAR_ARR_SCALE * c);
}
return s;
}
...then you will notice that generated code has a stray "movslq":
0x00007ffad8741180: movslq %r8d,%r9 <--- unnecessary sign extension
0x00007ffad8741183: movzwl 0x10(%r10,%r9,2)
0x00007ffad8741189: add %ecx,%edx
0x00007ffad874118b: inc %r8d
0x00007ffad874118e: cmp %r11d,%r8d
0x00007ffad8741191: jl 0x00007ffad8741180
Indeed, the baseline test that does the plain Java access:
@Benchmark
public int plain() {
int s = 0;
for (int c = 0; c < size; c++) {
s += charArr[c];
}
return s;
}
...does without sign extension:
0x00007fc3d0f84ea0: movzwl 0x10(%r11,%r8,2),%ecx
0x00007fc3d0f84ea6: add %ecx,%edx
0x00007fc3d0f84ea8: inc %r8d
0x00007fc3d0f84eab: cmp %r10d,%r8d
0x00007fc3d0f84eae: jl 0x00007fc3d0f84ea0
This conversion costs some cycles on my 1x4x2 4 GHz Haswell, running with Linux x86_64, 8u40 EA:
Benchmark (size) Mode Cnt Score Error Units
UnsafeMovslq.plain 1000 avgt 5 343.719 ± 0.694 ns/op
UnsafeMovslq.unsafe 1000 avgt 5 362.350 ± 5.302 ns/op
- relates to
-
JDK-8136924 Vectorized support for array equals/compare/mismatch using Unsafe
- Resolved
-
JDK-8075136 Unnecessary sign extension for byte array access
- Resolved
-
JDK-8136820 Generate better code for some Unsafe addressing patterns
- Resolved
-
JDK-8145322 Code generated from unsafe loops can be slightly improved
- Resolved
-
JDK-8136757 C1 and C2 intrinsics for StringUTF16.(get|set)Char
- Resolved