Currently zero_blocks ( if Zicboz is missing, which is very hard to fidn in h/w)
generates this code:
StubRoutines::zero_blocks [0x0000003fb1033f00, 0x0000003fb1033f58] (88 bytes)
0x0000003fb1033f00: addi t4,t4,-8
0.54% ? 0x0000003fb1033f04: bltz t4,Stub::zero_blocks+80 0x0000003fb1033f50
0.65% ?? 0x0000003fb1033f08: sd zero,0(t3)
5.88% ?? 0x0000003fb1033f0c: addi t3,t3,8
0.73% ?? 0x0000003fb1033f10: sd zero,0(t3)
3.72% ?? 0x0000003fb1033f14: addi t3,t3,8
1.83% ?? 0x0000003fb1033f18: sd zero,0(t3)
4.47% ?? 0x0000003fb1033f1c: addi t3,t3,8
1.31% ?? 0x0000003fb1033f20: sd zero,0(t3)
24.18% ?? 0x0000003fb1033f24: addi t3,t3,8
1.43% ?? 0x0000003fb1033f28: sd zero,0(t3)
16.56% ?? 0x0000003fb1033f2c: addi t3,t3,8
1.97% ?? 0x0000003fb1033f30: sd zero,0(t3)
3.44% ?? 0x0000003fb1033f34: addi t3,t3,8
1.46% ?? 0x0000003fb1033f38: sd zero,0(t3)
3.33% ?? 0x0000003fb1033f3c: addi t3,t3,8
1.68% ?? 0x0000003fb1033f40: sd zero,0(t3)
2.46% ?? 0x0000003fb1033f44: addi t3,t3,8
2.00% ?? 0x0000003fb1033f48: addi t4,t4,-8
0.85% ?? 0x0000003fb1033f4c: bgez t4,Stub::zero_blocks+8 0x0000003fb1033f08
0.04% ? 0x0000003fb1033f50: addi t4,t4,8
0.55% 0x0000003fb1033f54: ret
It can be optimized to produce this code and reduce code size, also reducing inter-op deps
StubRoutines::zero_blocks [0x0000003fa8e88b00, 0x0000003fa8e88b3c] (60 bytes)
0x0000003fa8e88b00: addi t4,t4,-8
0.50% ? 0x0000003fa8e88b04: bltz t4,Stub::zero_blocks+52 0x0000003fa8e88b34
0.82% ?? 0x0000003fa8e88b08: sd zero,0(t3)
5.50% ?? 0x0000003fa8e88b0c: sd zero,8(t3)
9.43% ?? 0x0000003fa8e88b10: sd zero,16(t3)
4.96% ?? 0x0000003fa8e88b14: sd zero,24(t3)
21.03% ?? 0x0000003fa8e88b18: sd zero,32(t3)
20.00% ?? 0x0000003fa8e88b1c: sd zero,40(t3)
6.57% ?? 0x0000003fa8e88b20: sd zero,48(t3)
3.91% ?? 0x0000003fa8e88b24: sd zero,56(t3)
4.37% ?? 0x0000003fa8e88b28: addi t3,t3,64
0.32% ?? 0x0000003fa8e88b2c: addi t4,t4,-8
0.88% ?? 0x0000003fa8e88b30: bgez t4,Stub::zero_blocks+8 0x0000003fa8e88b08
0.03% ? 0x0000003fa8e88b34: addi t4,t4,8
0.50% 0x0000003fa8e88b38: ret
store_words can also be improved in the similar way, from:
0.07% 0x0000003fa940a8dc: andi t0,t4,4
1.24% 0x0000003fa940a8e0: beqz t0,0x0000003fa940a904
0x0000003fa940a8e4: sd zero,0(t3)
0x0000003fa940a8e8: addi t3,t3,8
0x0000003fa940a8ec: sd zero,0(t3)
0x0000003fa940a8f0: addi t3,t3,8
0x0000003fa940a8f4: sd zero,0(t3)
0x0000003fa940a8f8: addi t3,t3,8
0x0000003fa940a8fc: sd zero,0(t3)
0x0000003fa940a900: addi t3,t3,8
0.58% 0x0000003fa940a904: andi t0,t4,2
0x0000003fa940a908: beqz t0,0x0000003fa940a91c
0x0000003fa940a90c: sd zero,0(t3)
0x0000003fa940a910: addi t3,t3,8
0x0000003fa940a914: sd zero,0(t3)
0x0000003fa940a918: addi t3,t3,8
0x0000003fa940a91c: andi t0,t4,1
0.17% 0x0000003fa940a920: beqz t0,0x0000003fa940a928
0x0000003fa940a924: sd zero,0(t3)
to:
0.27% 0x0000003fd9407acc: andi t0,t4,4
1.06% 0x0000003fd9407ad0: beqz t0,0x0000003fd9407ae8
0x0000003fd9407ad4: sd zero,0(t3)
0x0000003fd9407ad8: sd zero,8(t3)
0x0000003fd9407adc: sd zero,16(t3)
0x0000003fd9407ae0: sd zero,24(t3)
0x0000003fd9407ae4: addi t3,t3,32
0.53% 0x0000003fd9407ae8: andi t0,t4,2
0x0000003fd9407aec: beqz t0,0x0000003fd9407afc
0x0000003fd9407af0: sd zero,0(t3)
0x0000003fd9407af4: sd zero,8(t3)
0x0000003fd9407af8: addi t3,t3,16
0x0000003fd9407afc: andi t0,t4,1
0.15% 0x0000003fd9407b00: beqz t0,0x0000003fd9407b08
0x0000003fd9407b04: sd zero,0(t3)
generates this code:
StubRoutines::zero_blocks [0x0000003fb1033f00, 0x0000003fb1033f58] (88 bytes)
0x0000003fb1033f00: addi t4,t4,-8
0.54% ? 0x0000003fb1033f04: bltz t4,Stub::zero_blocks+80 0x0000003fb1033f50
0.65% ?? 0x0000003fb1033f08: sd zero,0(t3)
5.88% ?? 0x0000003fb1033f0c: addi t3,t3,8
0.73% ?? 0x0000003fb1033f10: sd zero,0(t3)
3.72% ?? 0x0000003fb1033f14: addi t3,t3,8
1.83% ?? 0x0000003fb1033f18: sd zero,0(t3)
4.47% ?? 0x0000003fb1033f1c: addi t3,t3,8
1.31% ?? 0x0000003fb1033f20: sd zero,0(t3)
24.18% ?? 0x0000003fb1033f24: addi t3,t3,8
1.43% ?? 0x0000003fb1033f28: sd zero,0(t3)
16.56% ?? 0x0000003fb1033f2c: addi t3,t3,8
1.97% ?? 0x0000003fb1033f30: sd zero,0(t3)
3.44% ?? 0x0000003fb1033f34: addi t3,t3,8
1.46% ?? 0x0000003fb1033f38: sd zero,0(t3)
3.33% ?? 0x0000003fb1033f3c: addi t3,t3,8
1.68% ?? 0x0000003fb1033f40: sd zero,0(t3)
2.46% ?? 0x0000003fb1033f44: addi t3,t3,8
2.00% ?? 0x0000003fb1033f48: addi t4,t4,-8
0.85% ?? 0x0000003fb1033f4c: bgez t4,Stub::zero_blocks+8 0x0000003fb1033f08
0.04% ? 0x0000003fb1033f50: addi t4,t4,8
0.55% 0x0000003fb1033f54: ret
It can be optimized to produce this code and reduce code size, also reducing inter-op deps
StubRoutines::zero_blocks [0x0000003fa8e88b00, 0x0000003fa8e88b3c] (60 bytes)
0x0000003fa8e88b00: addi t4,t4,-8
0.50% ? 0x0000003fa8e88b04: bltz t4,Stub::zero_blocks+52 0x0000003fa8e88b34
0.82% ?? 0x0000003fa8e88b08: sd zero,0(t3)
5.50% ?? 0x0000003fa8e88b0c: sd zero,8(t3)
9.43% ?? 0x0000003fa8e88b10: sd zero,16(t3)
4.96% ?? 0x0000003fa8e88b14: sd zero,24(t3)
21.03% ?? 0x0000003fa8e88b18: sd zero,32(t3)
20.00% ?? 0x0000003fa8e88b1c: sd zero,40(t3)
6.57% ?? 0x0000003fa8e88b20: sd zero,48(t3)
3.91% ?? 0x0000003fa8e88b24: sd zero,56(t3)
4.37% ?? 0x0000003fa8e88b28: addi t3,t3,64
0.32% ?? 0x0000003fa8e88b2c: addi t4,t4,-8
0.88% ?? 0x0000003fa8e88b30: bgez t4,Stub::zero_blocks+8 0x0000003fa8e88b08
0.03% ? 0x0000003fa8e88b34: addi t4,t4,8
0.50% 0x0000003fa8e88b38: ret
store_words can also be improved in the similar way, from:
0.07% 0x0000003fa940a8dc: andi t0,t4,4
1.24% 0x0000003fa940a8e0: beqz t0,0x0000003fa940a904
0x0000003fa940a8e4: sd zero,0(t3)
0x0000003fa940a8e8: addi t3,t3,8
0x0000003fa940a8ec: sd zero,0(t3)
0x0000003fa940a8f0: addi t3,t3,8
0x0000003fa940a8f4: sd zero,0(t3)
0x0000003fa940a8f8: addi t3,t3,8
0x0000003fa940a8fc: sd zero,0(t3)
0x0000003fa940a900: addi t3,t3,8
0.58% 0x0000003fa940a904: andi t0,t4,2
0x0000003fa940a908: beqz t0,0x0000003fa940a91c
0x0000003fa940a90c: sd zero,0(t3)
0x0000003fa940a910: addi t3,t3,8
0x0000003fa940a914: sd zero,0(t3)
0x0000003fa940a918: addi t3,t3,8
0x0000003fa940a91c: andi t0,t4,1
0.17% 0x0000003fa940a920: beqz t0,0x0000003fa940a928
0x0000003fa940a924: sd zero,0(t3)
to:
0.27% 0x0000003fd9407acc: andi t0,t4,4
1.06% 0x0000003fd9407ad0: beqz t0,0x0000003fd9407ae8
0x0000003fd9407ad4: sd zero,0(t3)
0x0000003fd9407ad8: sd zero,8(t3)
0x0000003fd9407adc: sd zero,16(t3)
0x0000003fd9407ae0: sd zero,24(t3)
0x0000003fd9407ae4: addi t3,t3,32
0.53% 0x0000003fd9407ae8: andi t0,t4,2
0x0000003fd9407aec: beqz t0,0x0000003fd9407afc
0x0000003fd9407af0: sd zero,0(t3)
0x0000003fd9407af4: sd zero,8(t3)
0x0000003fd9407af8: addi t3,t3,16
0x0000003fd9407afc: andi t0,t4,1
0.15% 0x0000003fd9407b00: beqz t0,0x0000003fd9407b08
0x0000003fd9407b04: sd zero,0(t3)
- duplicates
-
JDK-8295282 Use Zicboz/cbo.zero to zero-out memory on RISC-V
- Resolved