RA extension to demote EEVEX encoded NDD instructions to REX/REX2 encoded instructions:-
Currently, C2 Register Allocation Support For Two Address Instruction is split across the following 5 stages:-
1. Selection pattern
match addI_reg (rRegI dst, rRegI src) {
Set dst (AddI dst, src)
}
- Two address instructions are the ones where dst operand is the same as the first source operand.
- Matcher generates a MachNode with 3 MachOperands in the above case.
2. RA-IFG construction.
a. Inserts an interference edge b/w two_add operand and all other inputs.
b. Inserts an interference edge b/w instruction and defining node of two_add operand if two_add operand is live beyond the instruction, i.e. was part of live out set at the time of creating interferences for add machine node.
3. RA-PhaseAggressive coalescing
a. Injects a two_add MachSpillCopyNode before the add to break the live range of the two_add operand.
4. RA-Split
a. Unifies the live range of def operand of MachSpillCopyNode with the def operand of Add MachNode.
5. RA-PhaseCoalEascing
a. Coalesce the live range of src operand of MachSpillCopyNode with def of MachNode if the two live ranges do not intersect.
b. This will happen if the src operand is not used beyond the Add IR.
Thus, the compiler explicitly unifies the live range of two_add operand and its def if their live range do not interfere and do not just rely on colour biasing.
Q How is NDD instruction different from TwoAddr instruction?
A. While the number of MachOperands associated with a TwoAddr MachNode are same for both the variants, RA explicitly unifies the live range of TwoAdd operand with the def operand which ensure allocation of same register to both of them, there by removing redundant copy spill instruction before the two address instruction.
Q. What is the need for EVEX to REX / REX2 demotion?
A. NDD instruction uses a bulky 4-byte prefix; thus, even though we save GPR2GPR copy spill, which is generally absorbed during the Register Renaming stage and is never issued to OOO execution port, we increase the overall code size which, as a side effect, may impact inlining decisions.
Thus, EVEX to REX / REX2 demotion is a technique to counter redundant code size increase if one of the two source operands is not live beyond the instruction. There are multiple approaches to address this
1. Once RA determines non-intersecting live ranges of source and definition operand of an NDD MachNode, it can bias the colour to register mask (which by the way is statically associated with operands of a matcher pattern) of legacy register class, to ensure emission of REX encoding rather than giving allocator a freedom to pick any colour from EGPR register class as it will need additional byte for REX2 encoding.
2. An explicit pass in allocation before PhaseCoaleascing to unify the live range of non-intersecting source and definition operand of an NDD MachNode, RA is then free to assign any colour to this live range, during code emission make changes in the assembler layer to emit REX/REX2 flavour of such instructions.
3. Design an RA-only solution that replaces the NDD MachNode having a non-intersecting source and definition live range with a compatible two-address MachNode; this is similar to an existing RA optimization that replaces MEM2REG SPILL + Operation with a compatible CICS instruction, i.e. use memory operand flavour of an instruction.
Currently, C2 Register Allocation Support For Two Address Instruction is split across the following 5 stages:-
1. Selection pattern
match addI_reg (rRegI dst, rRegI src) {
Set dst (AddI dst, src)
}
- Two address instructions are the ones where dst operand is the same as the first source operand.
- Matcher generates a MachNode with 3 MachOperands in the above case.
2. RA-IFG construction.
a. Inserts an interference edge b/w two_add operand and all other inputs.
b. Inserts an interference edge b/w instruction and defining node of two_add operand if two_add operand is live beyond the instruction, i.e. was part of live out set at the time of creating interferences for add machine node.
3. RA-PhaseAggressive coalescing
a. Injects a two_add MachSpillCopyNode before the add to break the live range of the two_add operand.
4. RA-Split
a. Unifies the live range of def operand of MachSpillCopyNode with the def operand of Add MachNode.
5. RA-PhaseCoalEascing
a. Coalesce the live range of src operand of MachSpillCopyNode with def of MachNode if the two live ranges do not intersect.
b. This will happen if the src operand is not used beyond the Add IR.
Thus, the compiler explicitly unifies the live range of two_add operand and its def if their live range do not interfere and do not just rely on colour biasing.
Q How is NDD instruction different from TwoAddr instruction?
A. While the number of MachOperands associated with a TwoAddr MachNode are same for both the variants, RA explicitly unifies the live range of TwoAdd operand with the def operand which ensure allocation of same register to both of them, there by removing redundant copy spill instruction before the two address instruction.
Q. What is the need for EVEX to REX / REX2 demotion?
A. NDD instruction uses a bulky 4-byte prefix; thus, even though we save GPR2GPR copy spill, which is generally absorbed during the Register Renaming stage and is never issued to OOO execution port, we increase the overall code size which, as a side effect, may impact inlining decisions.
Thus, EVEX to REX / REX2 demotion is a technique to counter redundant code size increase if one of the two source operands is not live beyond the instruction. There are multiple approaches to address this
1. Once RA determines non-intersecting live ranges of source and definition operand of an NDD MachNode, it can bias the colour to register mask (which by the way is statically associated with operands of a matcher pattern) of legacy register class, to ensure emission of REX encoding rather than giving allocator a freedom to pick any colour from EGPR register class as it will need additional byte for REX2 encoding.
2. An explicit pass in allocation before PhaseCoaleascing to unify the live range of non-intersecting source and definition operand of an NDD MachNode, RA is then free to assign any colour to this live range, during code emission make changes in the assembler layer to emit REX/REX2 flavour of such instructions.
3. Design an RA-only solution that replaces the NDD MachNode having a non-intersecting source and definition live range with a compatible two-address MachNode; this is similar to an existing RA optimization that replaces MEM2REG SPILL + Operation with a compatible CICS instruction, i.e. use memory operand flavour of an instruction.
- relates to
-
JDK-8329030 Intel Advanced Performance Extension support
-
- Open
-