In low-level benchmarking, we sometimes resort to non-inlineable "sink" methods to escape dead-code elimination, like this:
@Benchmark
public void test() {
doNothing(obj);
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void doNothing(Object obj) {
// deliberately do nothing
}
The performance of this method is very important, since we usually deal with nanosecond-scale benchmarks. Ideally, the generated code should contain a "ret" right away.
However, the generated code for doNothing contains prolog followed immediately with epilog:
[Verified Entry Point]
10.93% 6.25% 0x00007f39f415fd80: mov %eax,-0x14000(%rsp)
3.76% 3.03% 0x00007f39f415fd87: push %rbp
1.92% 1.97% 0x00007f39f415fd88: sub $0x30,%rsp
10.42% 10.64% 0x00007f39f415fd8c: add $0x30,%rsp
2.88% 3.03% 0x00007f39f415fd90: pop %rbp
25.45% 31.68% 0x00007f39f415fd91: test %eax,0x15df8369(%rip) # 0x00007f3a09f58100
; {poll_return}
0.57% 0.47% 0x00007f39f415fd97: retq
It seems that at least RSP operations are redundant, as well as saving/restoring RBP.
It would be interesting to see if we can remove these redundant ops, e.g.:
*) Peephole MachPrologNode -> MachEpilogNode out completely;
*) Macro-expand MachProlog/EpilogNode into the individual ops, and then peephole (sub $const, %reg) -> (add $const, %reg) and (push %reg) -> (pop %reg);
*) Massage frame_size_in_bytes() so that it is zero for empty method;
Benchmark:
http://cr.openjdk.java.net/~shade/8130398/EmptyMethod.java
Runnable JAR:
http://cr.openjdk.java.net/~shade/8130398/benchmarks.jar
Output and disasssembly:
http://cr.openjdk.java.net/~shade/8130398/perfasm.out
@Benchmark
public void test() {
doNothing(obj);
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void doNothing(Object obj) {
// deliberately do nothing
}
The performance of this method is very important, since we usually deal with nanosecond-scale benchmarks. Ideally, the generated code should contain a "ret" right away.
However, the generated code for doNothing contains prolog followed immediately with epilog:
[Verified Entry Point]
10.93% 6.25% 0x00007f39f415fd80: mov %eax,-0x14000(%rsp)
3.76% 3.03% 0x00007f39f415fd87: push %rbp
1.92% 1.97% 0x00007f39f415fd88: sub $0x30,%rsp
10.42% 10.64% 0x00007f39f415fd8c: add $0x30,%rsp
2.88% 3.03% 0x00007f39f415fd90: pop %rbp
25.45% 31.68% 0x00007f39f415fd91: test %eax,0x15df8369(%rip) # 0x00007f3a09f58100
; {poll_return}
0.57% 0.47% 0x00007f39f415fd97: retq
It seems that at least RSP operations are redundant, as well as saving/restoring RBP.
It would be interesting to see if we can remove these redundant ops, e.g.:
*) Peephole MachPrologNode -> MachEpilogNode out completely;
*) Macro-expand MachProlog/EpilogNode into the individual ops, and then peephole (sub $const, %reg) -> (add $const, %reg) and (push %reg) -> (pop %reg);
*) Massage frame_size_in_bytes() so that it is zero for empty method;
Benchmark:
http://cr.openjdk.java.net/~shade/8130398/EmptyMethod.java
Runnable JAR:
http://cr.openjdk.java.net/~shade/8130398/benchmarks.jar
Output and disasssembly:
http://cr.openjdk.java.net/~shade/8130398/perfasm.out
- relates to
-
JDK-8145579 SimpleThresholdPolicy assumes non-trivial methods to be trivial
- Resolved
-
JDK-8133348 Reference.reachabilityFence
- Resolved
-
JDK-8134293 VH.(get|set)Opaque implementations
- Resolved