-
Enhancement
-
Resolution: Won't Fix
-
P4
-
None
-
5.0, 9, 10
-
x86
-
generic
On the x86 and amd64 platforms, but especially x86, register pressure is
much higher than on RISC platforms. The result is that many value spend
a good bit of their lifetimes on the stack, having been spilled there as
a result of failing to get allocated to a register for those parts of
their lifetimes.
The register allocator runs after the matcher, which latter optimistically
assumes that most values live in registers most of the time. This is rarely
the case on x86 in a method of any size, so c2 currently implements a
hacky mechanism called cisc-spilling to avoid allocating point lifetime
registers every time stack values are accessed. That mechanism only
allows the stack operand to be the right operand: if it's the left
operand, we're out of luck. As a result we generate less-than-good
code for methods with high register pressure, e.g., longStaticBTree::getObject
in specjbb, which accounts for 20% of specjbb's run time.
For patterns that have two read operands, or whose target is a spillcopy,
it'd be nice to have two cisc-spill patterns. In the latter case,
the target register holding the result of the spillcopy would
have to be dead after the memory write.
The former case is important for compares. Currently, the cisc-spill
version of
instruct compI_eReg(eFlagsReg cr, eRegI op1, eRegI op2) %{
match(Set cr (CmpI op1 op2));
format %{ "CMP $op1,$op2" %}
opcode(0x3B); /* Opcode 3B /r */
ins_encode( OpcP, RegReg( op1, op2) );
ins_pipe( ialu_cr_reg_reg );
%}
is
instruct compI_eReg_mem(eFlagsReg cr, eRegI op1, memory op2) %{
match(Set cr (CmpI op1 (LoadI op2)));
format %{ "CMP $op1,$op2" %}
opcode(0x3B); /* Opcode 3B /r */
ins_encode( OpcP, RegMem( op1, op2) );
ins_pipe( ialu_cr_reg_mem );
%}
This covers only half the spill cases. This pattern
instruct compI_mem_eReg(eFlagsReg cr, memory op1, eRegI op2) %{
match(Set cr (CmpI (LoadI op1) op2));
format %{ "CMP $op1,$op2" %}
opcode(0x39); /* Opcode 39 /r */
ins_encode( OpcP, RegMem( op2, op1) );
ins_pipe( ialu_cr_mem_reg );
%}
covers the other half. adlc and the register allocator could be changed
to handle more than one possible cisc-spill pattern. The sticking point
appears to be that the register mask assigned to the cisc-spill operand for
compI_eReg_mem (op2) in gather_lrg_masks includes stack locations. A naive
and incorrect implementation would also allow stack locations for
op1 of compI_mem_eReg, which I think would require there to be a mem_mem
pattern, which is impossible. To make it work, the register allocator would
have to allocate point lifetime registers for potential mem-mem patterns to
turn them into matchable reg-mem or mem-reg patterns. More generally,
we might create 'conditional virtual registers', i.e., ones that
get allocated for point lifetimes if some other virtual register
gets spilled.
We might also experiment with matching again after register allocation,
taking into account spills, rather than rewriting incrementally during
register allocation.
Mike Paleczny says that the original plan for c2 was to have multiple
cisc-spill patterns, but that it was never implemented. See him for
details, or perhaps he'll append a comment to this RFE.
much higher than on RISC platforms. The result is that many value spend
a good bit of their lifetimes on the stack, having been spilled there as
a result of failing to get allocated to a register for those parts of
their lifetimes.
The register allocator runs after the matcher, which latter optimistically
assumes that most values live in registers most of the time. This is rarely
the case on x86 in a method of any size, so c2 currently implements a
hacky mechanism called cisc-spilling to avoid allocating point lifetime
registers every time stack values are accessed. That mechanism only
allows the stack operand to be the right operand: if it's the left
operand, we're out of luck. As a result we generate less-than-good
code for methods with high register pressure, e.g., longStaticBTree::getObject
in specjbb, which accounts for 20% of specjbb's run time.
For patterns that have two read operands, or whose target is a spillcopy,
it'd be nice to have two cisc-spill patterns. In the latter case,
the target register holding the result of the spillcopy would
have to be dead after the memory write.
The former case is important for compares. Currently, the cisc-spill
version of
instruct compI_eReg(eFlagsReg cr, eRegI op1, eRegI op2) %{
match(Set cr (CmpI op1 op2));
format %{ "CMP $op1,$op2" %}
opcode(0x3B); /* Opcode 3B /r */
ins_encode( OpcP, RegReg( op1, op2) );
ins_pipe( ialu_cr_reg_reg );
%}
is
instruct compI_eReg_mem(eFlagsReg cr, eRegI op1, memory op2) %{
match(Set cr (CmpI op1 (LoadI op2)));
format %{ "CMP $op1,$op2" %}
opcode(0x3B); /* Opcode 3B /r */
ins_encode( OpcP, RegMem( op1, op2) );
ins_pipe( ialu_cr_reg_mem );
%}
This covers only half the spill cases. This pattern
instruct compI_mem_eReg(eFlagsReg cr, memory op1, eRegI op2) %{
match(Set cr (CmpI (LoadI op1) op2));
format %{ "CMP $op1,$op2" %}
opcode(0x39); /* Opcode 39 /r */
ins_encode( OpcP, RegMem( op2, op1) );
ins_pipe( ialu_cr_mem_reg );
%}
covers the other half. adlc and the register allocator could be changed
to handle more than one possible cisc-spill pattern. The sticking point
appears to be that the register mask assigned to the cisc-spill operand for
compI_eReg_mem (op2) in gather_lrg_masks includes stack locations. A naive
and incorrect implementation would also allow stack locations for
op1 of compI_mem_eReg, which I think would require there to be a mem_mem
pattern, which is impossible. To make it work, the register allocator would
have to allocate point lifetime registers for potential mem-mem patterns to
turn them into matchable reg-mem or mem-reg patterns. More generally,
we might create 'conditional virtual registers', i.e., ones that
get allocated for point lifetimes if some other virtual register
gets spilled.
We might also experiment with matching again after register allocation,
taking into account spills, rather than rewriting incrementally during
register allocation.
Mike Paleczny says that the original plan for c2 was to have multiple
cisc-spill patterns, but that it was never implemented. See him for
details, or perhaps he'll append a comment to this RFE.