Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8300109

RISC-V: Improve code generation for MinI/MaxI nodes

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 21
    • 21
    • hotspot
    • b06
    • riscv
    • linux

        Code generation for MinI/MaxI nodes could be improved when one of the source register is the same as the destination register.

        For example, C2 PrintOptoAssembly output snippet:
        /////////////
        0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
        0ae + bge R7, R10, Lsrc1 #@maxI_rReg
                mv R29, R10
                j Ldone
                bind Lsrc1
                mv R29, R7
                bind #@maxI_rReg
        0ba + sraiw R31, R28, (#24 & 0x1f) #@rShiftI_reg_imm

        0be B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
        0be + addw R7, R18, zr #@convI2L_reg_reg
        0c2 + add R7, R9, R7 # ptr, #@addP_reg_reg
        0c4 + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000

        0c8 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
        0c8 + lb R28, [R7, #16] # byte, #@loadB
        0cc + beq R28, R31, B37 #@cmpI_branch P=0.000000 C=24017.000000

        0d0 B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
        0d0 + addiw R7, R18, #-1 #@addI_reg_imm
        0d4 + ble R7, R29, B9 #@cmpI_loop P=0.500000 C=24447.000000
        /////////////

        This snippet could be optimized into following snippet:
        /////////////
        0aa + slliw R28, R8, (#24 & 0x1f) #@lShiftI_reg_imm
        0ae + sraiw R12, R28, (#24 & 0x1f) #@rShiftI_reg_imm
        0b2 + bge R7, R10, skip #@maxI_reg_reg
                mv R7, R10
                skip:

        0b8 B5: # out( B30 B6 ) <- in( B4 B8 ) Loop( B5-B8 inner pre of N255) Freq: 1.96481
        0b8 addw R28, R18, zr #@convI2L_reg_reg
        0bc + add R28, R9, R28 # ptr, #@addP_reg_reg
        0be + bgeu R18, R11, B30 #@cmpU_branch P=0.000001 C=-1.000000

        0c2 B6: # out( B37 B7 ) <- in( B5 ) Freq: 1.96481
        0c2 + lb R28, [R28, #16] # byte, #@loadB
        0c6 + beq R28, R12, B37 #@cmpI_branch P=0.000000 C=24017.000000

        0ca B7: # out( B9 B8 ) <- in( B6 ) Freq: 1.96481
        0ca + addiw R29, R18, #-1 #@addI_reg_imm
        0ce + ble R29, R7, B9 #@cmpI_loop P=0.500000 C=24447.000000
        /////////////

              fyang Fei Yang
              fyang Fei Yang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: