[lworld] Use profile information from (bimorphic) call site to optimize previous array load

XMLWordPrintable

    • generic
    • generic

      When loading from a (potentially flat) array and calling a method on the loaded element, we could use the profile information from the (bimorphic) call site to "go back" and optimize the array load.

      Below is the original bug description, the bug was split into multiple issues (see comments for details).

      ------------------------------------------

      Slow method invocations and acmp when it operates from a flattened array.

      Method invocation from a value class from a flattened array element is >10x times slower than method invocation from a heap-allocated value class.
      Compare:
      Benchmark Mode Cnt Score Error Units
      invoke.array.Value.target3_i avgt 5 1.600 ± 0.020 ns/op (array of references to value classes)
      invoke.array.Value.target3_v avgt 5 17.930 ± 0.310 ns/op (flattened array).

      Array elements comparison (acmp) is ~2x times slower when arrays are flat.

      In both cases, every loaded value from the flattened array is allocated in the heap:

      invoke.array.Value.target3_i:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁵ B/op
      invoke.array.Value.target3_v:gc.alloc.rate.norm avgt 5 24.000 ± 0.001 B/op

      acmp.array.Value032.branch_obj_equals000:gc.alloc.rate.norm avgt 5 ≈ 10⁻⁴ B/op
      acmp.array.Value032.branch_val_equals000:gc.alloc.rate.norm avgt 5 47.520 ± 0.001 B/op

      Note: for "acmp" both values (left and right) is allocated.

      The most contributor to the performance slowdown is the method:
      "OptoRuntime::load_unknown_inline_C" which takes up to 90% of CPU time.
        

            Assignee:
            Unassigned
            Reporter:
            Sergey Kuksenko
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: