Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8281309

x86: Select better stack banging instruction

XMLWordPrintable

      During JDK-8072070 review, Xin noticed that we might use loads instead of stores during stack banging. This should alleviate some of the costs associated with large stack banging in interpreter mode, and maybe drop the overhead for compiled calls too.

      We can probably do this:
      - movl(Address(rsp, (-offset)), rax);
      + testptr(rax, Address(rsp, (-offset)));

      This still kills flags, but does not need a scratch register. Safepoint polls are doing the same.

      The downside for banging with loads is that OSes with delay commit would probably satisfy the read from the zero page, thus not extending real RSS until the stack is actually used. In the worst case, that means, we keep the shadow zone unwired to physical pages until the first actual use.

      Draft PR:
        https://github.com/openjdk/jdk/pull/7361

      Remarkably, AArch64 went the other way around with JDK-8075045 -- moved from loads back to stores for performance reasons. (Might be that "loading" into zr went bad?)

      On Zen 2, this seems to improve -Xint SPECjvm2008:compress for about +50%. Wary of AArch64 example, this deserves testing on a wider variety of x86 implementations. Point performance runs show that these improvements are completely subsumed in -Xint mode by JDK-8072070.

      There might still be a slight benefit for stack banging at compiled method entry, however. SPECjvm2008 runs with different stack banging schemes show little benefit in changing to "movb(Address(rsp, (-offset)), 0);", possibly because we untie the dependency on the register.

            shade Aleksey Shipilev
            shade Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: