-
Bug
-
Resolution: Unresolved
-
P3
-
repo-valhalla
-
generic
-
generic
Slow method invocations and acmp when it operates from a flattened array.
Method invocation from a value class from a flattened array element is >10x times slower than method invocation from a heap-allocated value class.
Compare:
Benchmark Mode Cnt Score Error Units
invoke.array.Value.target3_i avgt 5 1.600 ± 0.020 ns/op (array of references to value classes)
invoke.array.Value.target3_v avgt 5 17.930 ± 0.310 ns/op (flattened array).
Array elements comparison (acmp) is ~2x times slower when arrays are flat.
In both cases, every loaded value from the flattened array is allocated in the heap:
invoke.array.Value.target3_i:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁵ B/op
invoke.array.Value.target3_v:gc.alloc.rate.norm avgt 5 24.000 ± 0.001 B/op
acmp.array.Value032.branch_obj_equals000:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁴ B/op
acmp.array.Value032.branch_val_equals000:gc.alloc.rate.norm avgt 5 47.520 ± 0.001 B/op
Note: for "acmp" both values (left and right) is allocated.
The most contributor to the performance slowdown is the method:
"OptoRuntime::load_unknown_inline_C" which takes up to 90% of CPU time.
Method invocation from a value class from a flattened array element is >10x times slower than method invocation from a heap-allocated value class.
Compare:
Benchmark Mode Cnt Score Error Units
invoke.array.Value.target3_i avgt 5 1.600 ± 0.020 ns/op (array of references to value classes)
invoke.array.Value.target3_v avgt 5 17.930 ± 0.310 ns/op (flattened array).
Array elements comparison (acmp) is ~2x times slower when arrays are flat.
In both cases, every loaded value from the flattened array is allocated in the heap:
invoke.array.Value.target3_i:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁵ B/op
invoke.array.Value.target3_v:gc.alloc.rate.norm avgt 5 24.000 ± 0.001 B/op
acmp.array.Value032.branch_obj_equals000:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁴ B/op
acmp.array.Value032.branch_val_equals000:gc.alloc.rate.norm avgt 5 47.520 ± 0.001 B/op
Note: for "acmp" both values (left and right) is allocated.
The most contributor to the performance slowdown is the method:
"OptoRuntime::load_unknown_inline_C" which takes up to 90% of CPU time.