-
Enhancement
-
Resolution: Won't Fix
-
P4
-
8, 11, 17, 18, 19
During JDK-8072070 review, Xin noticed that we might use loads instead of stores during stack banging. This should alleviate some of the costs associated with large stack banging in interpreter mode, and maybe drop the overhead for compiled calls too.
We can probably do this:
- movl(Address(rsp, (-offset)), rax);
+ testptr(rax, Address(rsp, (-offset)));
This still kills flags, but does not need a scratch register. Safepoint polls are doing the same.
The downside for banging with loads is that OSes with delay commit would probably satisfy the read from the zero page, thus not extending real RSS until the stack is actually used. In the worst case, that means, we keep the shadow zone unwired to physical pages until the first actual use.
Draft PR:
https://github.com/openjdk/jdk/pull/7361
Remarkably, AArch64 went the other way around withJDK-8075045 -- moved from loads back to stores for performance reasons. (Might be that "loading" into zr went bad?)
On Zen 2, this seems to improve -Xint SPECjvm2008:compress for about +50%. Wary of AArch64 example, this deserves testing on a wider variety of x86 implementations. Point performance runs show that these improvements are completely subsumed in -Xint mode byJDK-8072070.
There might still be a slight benefit for stack banging at compiled method entry, however. SPECjvm2008 runs with different stack banging schemes show little benefit in changing to "movb(Address(rsp, (-offset)), 0);", possibly because we untie the dependency on the register.
We can probably do this:
- movl(Address(rsp, (-offset)), rax);
+ testptr(rax, Address(rsp, (-offset)));
This still kills flags, but does not need a scratch register. Safepoint polls are doing the same.
The downside for banging with loads is that OSes with delay commit would probably satisfy the read from the zero page, thus not extending real RSS until the stack is actually used. In the worst case, that means, we keep the shadow zone unwired to physical pages until the first actual use.
Draft PR:
https://github.com/openjdk/jdk/pull/7361
Remarkably, AArch64 went the other way around with
On Zen 2, this seems to improve -Xint SPECjvm2008:compress for about +50%. Wary of AArch64 example, this deserves testing on a wider variety of x86 implementations. Point performance runs show that these improvements are completely subsumed in -Xint mode by
There might still be a slight benefit for stack banging at compiled method entry, however. SPECjvm2008 runs with different stack banging schemes show little benefit in changing to "movb(Address(rsp, (-offset)), 0);", possibly because we untie the dependency on the register.
- relates to
-
JDK-8072070 Improve interpreter stack banging
- Resolved