Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8365926

RISCV: Performance regression in renaissance (chi-square)

XMLWordPrintable

    • riscv
    • linux

      When running e.g. chi-square a large performance regression can be seen on some hardware (in this case P550).
      These renaissance benchmarks are highly compiler dependent, meaning result can vary with 30% run to run due to differences in coda cache (both due to profiling and due to placement of code).

      One major factor is that pre-24 rv64 used trampoline calls:
      ##############
        0x00007ff43025ee8c: jal ra,0x00007ff43025f16c // if target reachable we did a direct call here, otherwise via tramopline
      ...
        0x00007ff43025f16c: auipc t1,0x0 ; {trampoline_stub}
        0x00007ff43025f170: ld t1,12(t1) # 0x00007ff43025f178
        0x00007ff43025f174: jalr zero,0(t1)
        0x00007ff43025f178: <8-byte address> // atomically patchable
      #################

      Due to issues with loading intra-cache and an unneeded jump this was change in: "8332689: RISC-V: Use load instead of trampolines"

      ##################
        0x00007ff3b4342c30: auipc t1,0x0
        0x00007ff3b4342c34: ld t1,832(t1) # 0x00007ff3b4342f70
        0x00007ff3b4342c38: jalr ra,0(t1)
      ...
        0x00007ff3b4342f70: <8-byte address> // atomically patchable
      #################

      But this implementation didn't have direct calls, as they in practice are rare.

            rehn Robbin Ehn
            rehn Robbin Ehn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: