-
Enhancement
-
Resolution: Unresolved
-
P4
-
8, 9, 10
-
generic
-
generic
While investigating Lucene bug I found that C2 will generate terrible code for low-frequency loops.
For simple sum of elements loop RA generates a lot of stack spills because it did not clone zeroing of index variable before the loop.
private static int sum(int[] arr) {
int sum = 0;
for (int el : arr) {
sum += el;
}
return sum;
}
25e4 B413: # B414 <- B412 Freq: 0.0229312
25e4 xorl R10, R10 # int
nop # 9 bytes pad for loops and calls
25f0 B414: # B414 B415 <- B413 B414 Loop: B414-B414 inner Freq: 0.229312
25f0 movq R11, [rsp + #112] # spill
25f5 movl R9, [rsp + #36] # spill
25fa addl R10, [R11 + #24 + R9 << #2] # int
25ff movl R11, [rsp + #36] # spill
2604 incl R11 # int
2607 movl [rsp + #36], R11 # spill
260c cmpl R11, [RSP + #120 (32-bit)]
2611 jl,s B414 # loop end P=0.900000 C=-1.000000
For simple sum of elements loop RA generates a lot of stack spills because it did not clone zeroing of index variable before the loop.
private static int sum(int[] arr) {
int sum = 0;
for (int el : arr) {
sum += el;
}
return sum;
}
25e4 B413: # B414 <- B412 Freq: 0.0229312
25e4 xorl R10, R10 # int
nop # 9 bytes pad for loops and calls
25f0 B414: # B414 B415 <- B413 B414 Loop: B414-B414 inner Freq: 0.229312
25f0 movq R11, [rsp + #112] # spill
25f5 movl R9, [rsp + #36] # spill
25fa addl R10, [R11 + #24 + R9 << #2] # int
25ff movl R11, [rsp + #36] # spill
2604 incl R11 # int
2607 movl [rsp + #36], R11 # spill
260c cmpl R11, [RSP + #120 (32-bit)]
2611 jl,s B414 # loop end P=0.900000 C=-1.000000