-
Enhancement
-
Resolution: Fixed
-
P4
-
21
-
b06
-
riscv
-
linux
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8311745 | 17.0.9 | Fei Yang | P4 | Resolved | Fixed | b01 |
Code generation for MinI/MaxI nodes could be improved when one of the source register is the same as the destination register.
For example, C2 PrintOptoAssembly output snippet:
/////////////
0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
0ae + bge R7, R10, Lsrc1 #@maxI_rReg
mv R29, R10
j Ldone
bind Lsrc1
mv R29, R7
bind #@maxI_rReg
0ba + sraiw R31, R28, (#24 & 0x1f) #@rShiftI_reg_imm
0be B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
0be + addw R7, R18, zr #@convI2L_reg_reg
0c2 + add R7, R9, R7 # ptr, #@addP_reg_reg
0c4 + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000
0c8 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
0c8 + lb R28, [R7, #16] # byte, #@loadB
0cc + beq R28, R31, B37 #@cmpI_branch P=0.000000 C=24017.000000
0d0 B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
0d0 + addiw R7, R18, #-1 #@addI_reg_imm
0d4 + ble R7, R29, B9 #@cmpI_loop P=0.500000 C=24447.000000
/////////////
This snippet could be optimized into following snippet:
/////////////
0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
0ae + sraiw R12, R28, (#24 & 0x1f) #@rShiftI_reg_imm
0b2 + bge R7, R10, skip #@maxI_reg_reg
mv R7, R10
skip:
0b8 B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
0b8 addw R28, R18, zr #@convI2L_reg_reg
0bc + add R28, R9, R28 # ptr, #@addP_reg_reg
0be + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000
0c2 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
0c2 + lb R28, [R28, #16] # byte, #@loadB
0c6 + beq R28, R12, B37 #@cmpI_branch P=0.000000 C=24017.000000
0ca B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
0ca + addiw R29, R18, #-1 #@addI_reg_imm
0ce + ble R29, R7, B9 #@cmpI_loop P=0.500000 C=24447.000000
/////////////
For example, C2 PrintOptoAssembly output snippet:
/////////////
0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
0ae + bge R7, R10, Lsrc1 #@maxI_rReg
mv R29, R10
j Ldone
bind Lsrc1
mv R29, R7
bind #@maxI_rReg
0ba + sraiw R31, R28, (#24 & 0x1f) #@rShiftI_reg_imm
0be B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
0be + addw R7, R18, zr #@convI2L_reg_reg
0c2 + add R7, R9, R7 # ptr, #@addP_reg_reg
0c4 + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000
0c8 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
0c8 + lb R28, [R7, #16] # byte, #@loadB
0cc + beq R28, R31, B37 #@cmpI_branch P=0.000000 C=24017.000000
0d0 B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
0d0 + addiw R7, R18, #-1 #@addI_reg_imm
0d4 + ble R7, R29, B9 #@cmpI_loop P=0.500000 C=24447.000000
/////////////
This snippet could be optimized into following snippet:
/////////////
0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
0ae + sraiw R12, R28, (#24 & 0x1f) #@rShiftI_reg_imm
0b2 + bge R7, R10, skip #@maxI_reg_reg
mv R7, R10
skip:
0b8 B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
0b8 addw R28, R18, zr #@convI2L_reg_reg
0bc + add R28, R9, R28 # ptr, #@addP_reg_reg
0be + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000
0c2 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
0c2 + lb R28, [R28, #16] # byte, #@loadB
0c6 + beq R28, R12, B37 #@cmpI_branch P=0.000000 C=24017.000000
0ca B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
0ca + addiw R29, R18, #-1 #@addI_reg_imm
0ce + ble R29, R7, B9 #@cmpI_loop P=0.500000 C=24447.000000
/////////////