-
Enhancement
-
Resolution: Unresolved
-
P4
-
24
-
generic
-
linux
While compiling the attached cases with
`java -XX:MaxVectorSize=16 -XX:CompileCommand=compileonly,Test::* -XX:-TieredCompilation -Xbatch -XX:+DebugNonSafepoints Test` on aarch64 platform,
the assembly code of post loop is like:
```
[2024-11-12T17:23:16.549Z] 454 B43: # out( B48 B44 ) <- in( B42 B45 ) Loop( B43-B45 inner post of N954) Freq: 75935.6
[2024-11-12T17:23:16.549Z] 454 sxtw R7, R10 # i2l
[2024-11-12T17:23:16.549Z] 458 add R20, R24, R7, LShiftL #3 # ptr
[2024-11-12T17:23:16.549Z] 45c ldrw R11, [R19, #28] # int
[2024-11-12T17:23:16.549Z] 460 addw R12, R11, R10
[2024-11-12T17:23:16.549Z] 464 strw R12, [R19, #28] # int
[2024-11-12T17:23:16.549Z] 468 cmpw R10, R16 # unsigned
[2024-11-12T17:23:16.549Z] 46c bhs B48 # unsigned P=0.000001 C=-1.000000
[2024-11-12T17:23:16.549Z]
[2024-11-12T17:23:16.549Z] 470 B44: # out( B50 B45 ) <- in( B43 ) Freq: 75935.5
[2024-11-12T17:23:16.549Z] 470 add R11, R14, R7, LShiftL #1 # ptr
[2024-11-12T17:23:16.549Z] 474 strh R0, [R11, #16] # short
[2024-11-12T17:23:16.549Z] 478 cmpw R10, R15 # unsigned
[2024-11-12T17:23:16.549Z] 47c bhs B50 # unsigned P=0.000001 C=-1.000000
[2024-11-12T17:23:16.549Z]
[2024-11-12T17:23:16.549Z] 480 B45: # out( B43 B46 ) <- in( B44 ) Freq: 75935.5
[2024-11-12T17:23:16.549Z] 480 ldr R11, [R20, #16] # int
[2024-11-12T17:23:16.549Z] 484 sub R11, R11, R25
[2024-11-12T17:23:16.549Z] 488 str R11, [R20, #16] # int
[2024-11-12T17:23:16.549Z] 48c addw R10, R10, #1
[2024-11-12T17:23:16.549Z] 490 cmpw R10, #66
[2024-11-12T17:23:16.549Z] 494 blt B43 // counted loop end P=0.500000 C=5172.000000
```
we can find that the range checks
```
[2024-11-12T17:23:16.549Z] 468 cmpw R10, R16 # unsigned
[2024-11-12T17:23:16.549Z] 46c bhs B48 # unsigned P=0.000001 C=-1.000000
...
[2024-11-12T17:23:16.549Z] 478 cmpw R10, R15 # unsigned
[2024-11-12T17:23:16.549Z] 47c bhs B50 # unsigned P=0.000001 C=-1.000000
```
are included in the post loop body, which can also be found on X86 platform as:
```
[2024-11-12T17:23:54.759Z] 414 B44: # out( B49 B45 ) <- in( B43 B46 ) Loop( B44-B46 inner post of N954) Freq: 75935.6
[2024-11-12T17:23:54.759Z] 414 addl [R14 + #28 (8-bit)], R11 # int
[2024-11-12T17:23:54.759Z] 418 cmpl R11, [RSP + #0 (32-bit)] # unsigned
[2024-11-12T17:23:54.759Z] 41c jae,us B49 P=0.000001 C=-1.000000
[2024-11-12T17:23:54.759Z]
[2024-11-12T17:23:54.759Z] 41e B45: # out( B51 B46 ) <- in( B44 ) Freq: 75935.5
[2024-11-12T17:23:54.759Z] 41e movl R13, [rsp + #8] # spill
[2024-11-12T17:23:54.759Z] 423 movw [RCX + #16 + R11 << #1], R13 # char/short
[2024-11-12T17:23:54.760Z] 429 cmpl R11, [RSP + #4 (32-bit)] # unsigned
[2024-11-12T17:23:54.760Z] 42e jae,us B51 P=0.000001 C=-1.000000
[2024-11-12T17:23:54.760Z]
[2024-11-12T17:23:54.760Z] 430 B46: # out( B44 B47 ) <- in( B45 ) Freq: 75935.5
[2024-11-12T17:23:54.760Z] 430 subq [R9 + #16 + R11 << #3], RDX # long
[2024-11-12T17:23:54.760Z] 435 incl R11 # int
[2024-11-12T17:23:54.760Z] 438 cmpl R11, #66
[2024-11-12T17:23:54.760Z] 43c jl,s B44 # loop end P=0.500000 C=5172.000000
```
Logs are attached.
I checked the ideal graph on aarch64 platform. We can see that RangeCheck nodes 1378 and 1382 are wrongly inserted after CountedLoop node 1377.
`java -XX:MaxVectorSize=16 -XX:CompileCommand=compileonly,Test::* -XX:-TieredCompilation -Xbatch -XX:+DebugNonSafepoints Test` on aarch64 platform,
the assembly code of post loop is like:
```
[2024-11-12T17:23:16.549Z] 454 B43: # out( B48 B44 ) <- in( B42 B45 ) Loop( B43-B45 inner post of N954) Freq: 75935.6
[2024-11-12T17:23:16.549Z] 454 sxtw R7, R10 # i2l
[2024-11-12T17:23:16.549Z] 458 add R20, R24, R7, LShiftL #3 # ptr
[2024-11-12T17:23:16.549Z] 45c ldrw R11, [R19, #28] # int
[2024-11-12T17:23:16.549Z] 460 addw R12, R11, R10
[2024-11-12T17:23:16.549Z] 464 strw R12, [R19, #28] # int
[2024-11-12T17:23:16.549Z] 468 cmpw R10, R16 # unsigned
[2024-11-12T17:23:16.549Z] 46c bhs B48 # unsigned P=0.000001 C=-1.000000
[2024-11-12T17:23:16.549Z]
[2024-11-12T17:23:16.549Z] 470 B44: # out( B50 B45 ) <- in( B43 ) Freq: 75935.5
[2024-11-12T17:23:16.549Z] 470 add R11, R14, R7, LShiftL #1 # ptr
[2024-11-12T17:23:16.549Z] 474 strh R0, [R11, #16] # short
[2024-11-12T17:23:16.549Z] 478 cmpw R10, R15 # unsigned
[2024-11-12T17:23:16.549Z] 47c bhs B50 # unsigned P=0.000001 C=-1.000000
[2024-11-12T17:23:16.549Z]
[2024-11-12T17:23:16.549Z] 480 B45: # out( B43 B46 ) <- in( B44 ) Freq: 75935.5
[2024-11-12T17:23:16.549Z] 480 ldr R11, [R20, #16] # int
[2024-11-12T17:23:16.549Z] 484 sub R11, R11, R25
[2024-11-12T17:23:16.549Z] 488 str R11, [R20, #16] # int
[2024-11-12T17:23:16.549Z] 48c addw R10, R10, #1
[2024-11-12T17:23:16.549Z] 490 cmpw R10, #66
[2024-11-12T17:23:16.549Z] 494 blt B43 // counted loop end P=0.500000 C=5172.000000
```
we can find that the range checks
```
[2024-11-12T17:23:16.549Z] 468 cmpw R10, R16 # unsigned
[2024-11-12T17:23:16.549Z] 46c bhs B48 # unsigned P=0.000001 C=-1.000000
...
[2024-11-12T17:23:16.549Z] 478 cmpw R10, R15 # unsigned
[2024-11-12T17:23:16.549Z] 47c bhs B50 # unsigned P=0.000001 C=-1.000000
```
are included in the post loop body, which can also be found on X86 platform as:
```
[2024-11-12T17:23:54.759Z] 414 B44: # out( B49 B45 ) <- in( B43 B46 ) Loop( B44-B46 inner post of N954) Freq: 75935.6
[2024-11-12T17:23:54.759Z] 414 addl [R14 + #28 (8-bit)], R11 # int
[2024-11-12T17:23:54.759Z] 418 cmpl R11, [RSP + #0 (32-bit)] # unsigned
[2024-11-12T17:23:54.759Z] 41c jae,us B49 P=0.000001 C=-1.000000
[2024-11-12T17:23:54.759Z]
[2024-11-12T17:23:54.759Z] 41e B45: # out( B51 B46 ) <- in( B44 ) Freq: 75935.5
[2024-11-12T17:23:54.759Z] 41e movl R13, [rsp + #8] # spill
[2024-11-12T17:23:54.759Z] 423 movw [RCX + #16 + R11 << #1], R13 # char/short
[2024-11-12T17:23:54.760Z] 429 cmpl R11, [RSP + #4 (32-bit)] # unsigned
[2024-11-12T17:23:54.760Z] 42e jae,us B51 P=0.000001 C=-1.000000
[2024-11-12T17:23:54.760Z]
[2024-11-12T17:23:54.760Z] 430 B46: # out( B44 B47 ) <- in( B45 ) Freq: 75935.5
[2024-11-12T17:23:54.760Z] 430 subq [R9 + #16 + R11 << #3], RDX # long
[2024-11-12T17:23:54.760Z] 435 incl R11 # int
[2024-11-12T17:23:54.760Z] 438 cmpl R11, #66
[2024-11-12T17:23:54.760Z] 43c jl,s B44 # loop end P=0.500000 C=5172.000000
```
Logs are attached.
I checked the ideal graph on aarch64 platform. We can see that RangeCheck nodes 1378 and 1382 are wrongly inserted after CountedLoop node 1377.