-
Enhancement
-
Resolution: Unresolved
-
P4
-
21
-
I am testing on Linux on Ampere Altra, but other aarch64 platforms might also benefit from the change.
-
aarch64
-
generic
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method currently loops over the itable list with
bind(search);
// Check that the previous entry is non-null. A null entry means that
// the receiver class doesn't implement the interface, and wasn't the
// same as when the caller was compiled.
cbz(method_result, L_no_such_interface);
if (itableOffsetEntry::interface_offset_in_bytes() != 0) {
add(scan_temp, scan_temp, scan_step);
ldr(method_result, Address(scan_temp, itableOffsetEntry::interface_offset_in_bytes()));
} else {
ldr(method_result, Address(pre(scan_temp, scan_step)));
}
cmp(intf_klass, method_result);
br(Assembler::NE, search);
using two branches. That could be replaced with
bind(search);
if (itableOffsetEntry::interface_offset_in_bytes() != 0) {
add(scan_temp, scan_temp, scan_step);
ldr(method_result, Address(scan_temp, itableOffsetEntry::interface_offset_in_bytes()));
} else {
ldr(method_result, Address(pre(scan_temp, scan_step)));
}
cmp(intf_klass, method_result);
// Bits are: N Z V C.
ccmp(method_result, 0, 0b0100, Assembler::NE);
br(Assembler::NE, search);
with only one branch, at the cost of having to test why the loop exited after the loop.
That change trades a branch for an integer operation in each iteration of the loop, and on platforms with spare integer execution units and limited branch execution units (and branch predictors, and instruction prefetchers), the trade-off is a win.
bind(search);
// Check that the previous entry is non-null. A null entry means that
// the receiver class doesn't implement the interface, and wasn't the
// same as when the caller was compiled.
cbz(method_result, L_no_such_interface);
if (itableOffsetEntry::interface_offset_in_bytes() != 0) {
add(scan_temp, scan_temp, scan_step);
ldr(method_result, Address(scan_temp, itableOffsetEntry::interface_offset_in_bytes()));
} else {
ldr(method_result, Address(pre(scan_temp, scan_step)));
}
cmp(intf_klass, method_result);
br(Assembler::NE, search);
using two branches. That could be replaced with
bind(search);
if (itableOffsetEntry::interface_offset_in_bytes() != 0) {
add(scan_temp, scan_temp, scan_step);
ldr(method_result, Address(scan_temp, itableOffsetEntry::interface_offset_in_bytes()));
} else {
ldr(method_result, Address(pre(scan_temp, scan_step)));
}
cmp(intf_klass, method_result);
// Bits are: N Z V C.
ccmp(method_result, 0, 0b0100, Assembler::NE);
br(Assembler::NE, search);
with only one branch, at the cost of having to test why the loop exited after the loop.
That change trades a branch for an integer operation in each iteration of the loop, and on platforms with spare integer execution units and limited branch execution units (and branch predictors, and instruction prefetchers), the trade-off is a win.
- relates to
-
JDK-8331341 secondary_super_cache does not scale well: C1 and interpreter
- Resolved
-
JDK-8307352 AARCH64: Improve itable_stub
- Resolved