# Description
C2 currently emits a separate trampoline stub for each static call, even when multiple call sites target the same static method. This is unlike the treatment of runtime calls, which do share trampoline stubs across different call sites. The lack of sharing in the static call case leads to redundant stub generation and unnecessary code cache bloat.
There is an existing test in the repository, test/hotspot/jtreg/compiler/sharedstubs/SharedTrampolineTest.java, which appears to be intended to verify trampoline sharing. However, this test currently passes as if static call trampolines are being shared, even though they are not. I plan to update this test to reflect the actual state of trampoline generation more accurately.
Suggested Improvement:
## Example
The following Java class illustrates the inefficiency:
```java
public class StaticCallTest {
private static final int WARMUP_ITERATIONS = 20_000;
private static volatile int sink = 0;
public static void main(String[] args) {
StaticCallTest test = new StaticCallTest();
for (int i = 0; i < WARMUP_ITERATIONS; i++) {
test.testStaticCall(i);
}
}
private void testStaticCall(int iteration) {
foo();
foo();
bar();
}
public static void foo() {
sink += 1;
}
public static void bar() {
sink += 2;
}
}
```
## Compilation and Observation:
Compiled and inspected with:
```sh
java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=compileonly,StaticCallTest.testStaticCall \
-XX:CompileCommand=print,StaticCallTest.testStaticCall -XX:+PrintRelocations \
-XX:-TieredCompilation -XX:-Inline StaticCallTest
```
The relocation section for the compiled testStaticCall method shows three static call relocations, each accompanied by its own trampoline_stub, even though two of them call foo():
```
relocInfo@0x0000e4c42c09eb92 [type=4(static_call) addr=0x0000e4c47bc48300 offset=36] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
relocInfo@0x0000e4c42c09eb96 [type=4(static_call) addr=0x0000e4c47bc48310 offset=12] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
relocInfo@0x0000e4c42c09eb9a [type=4(static_call) addr=0x0000e4c47bc48320 offset=12] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
...
relocInfo@0x0000e4c42c09ebb2 [type=13(trampoline_stub) addr=0x0000e4c47bc483a0 offset=0 data=-16] | [trampoline owner=0x0000e4c47bc48300]
relocInfo@0x0000e4c42c09ebb8 [type=13(trampoline_stub) addr=0x0000e4c47bc483b0 offset=16 data=-20] | [trampoline owner=0x0000e4c47bc48310]
relocInfo@0x0000e4c42c09ebbe [type=13(trampoline_stub) addr=0x0000e4c47bc483c0 offset=16 data=-24] | [trampoline owner=0x0000e4c47bc48320]
```
This confirms that trampolines are not being shared between multiple call sites to the same static method.
# Suggested Improvement
Modify the C2 compiler to share trampoline stubs between static calls that resolve to the same callee method. Since the relocation target for all static calls is initially set to the static call resolver stub, the call's target alone cannot be used to distinguish between different static method calls. Instead, trampoline stubs should be shared based on the actual callee.
## On optimized virtual calls
There may also be an opportunity to generalize this optimization to optimized virtual calls. However, this requires further investigation, particularly into how such calls are resolved and what metadata is available at compile time to facilitate safe trampoline sharing.
C2 currently emits a separate trampoline stub for each static call, even when multiple call sites target the same static method. This is unlike the treatment of runtime calls, which do share trampoline stubs across different call sites. The lack of sharing in the static call case leads to redundant stub generation and unnecessary code cache bloat.
There is an existing test in the repository, test/hotspot/jtreg/compiler/sharedstubs/SharedTrampolineTest.java, which appears to be intended to verify trampoline sharing. However, this test currently passes as if static call trampolines are being shared, even though they are not. I plan to update this test to reflect the actual state of trampoline generation more accurately.
Suggested Improvement:
## Example
The following Java class illustrates the inefficiency:
```java
public class StaticCallTest {
private static final int WARMUP_ITERATIONS = 20_000;
private static volatile int sink = 0;
public static void main(String[] args) {
StaticCallTest test = new StaticCallTest();
for (int i = 0; i < WARMUP_ITERATIONS; i++) {
test.testStaticCall(i);
}
}
private void testStaticCall(int iteration) {
foo();
foo();
bar();
}
public static void foo() {
sink += 1;
}
public static void bar() {
sink += 2;
}
}
```
## Compilation and Observation:
Compiled and inspected with:
```sh
java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=compileonly,StaticCallTest.testStaticCall \
-XX:CompileCommand=print,StaticCallTest.testStaticCall -XX:+PrintRelocations \
-XX:-TieredCompilation -XX:-Inline StaticCallTest
```
The relocation section for the compiled testStaticCall method shows three static call relocations, each accompanied by its own trampoline_stub, even though two of them call foo():
```
relocInfo@0x0000e4c42c09eb92 [type=4(static_call) addr=0x0000e4c47bc48300 offset=36] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
relocInfo@0x0000e4c42c09eb96 [type=4(static_call) addr=0x0000e4c47bc48310 offset=12] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
relocInfo@0x0000e4c42c09eb9a [type=4(static_call) addr=0x0000e4c47bc48320 offset=12] | [destination=0x0000e4c47bcd1b40 metadata=0x0000000000000000] Blob::Shared Runtime resolve_static_call_blob
...
relocInfo@0x0000e4c42c09ebb2 [type=13(trampoline_stub) addr=0x0000e4c47bc483a0 offset=0 data=-16] | [trampoline owner=0x0000e4c47bc48300]
relocInfo@0x0000e4c42c09ebb8 [type=13(trampoline_stub) addr=0x0000e4c47bc483b0 offset=16 data=-20] | [trampoline owner=0x0000e4c47bc48310]
relocInfo@0x0000e4c42c09ebbe [type=13(trampoline_stub) addr=0x0000e4c47bc483c0 offset=16 data=-24] | [trampoline owner=0x0000e4c47bc48320]
```
This confirms that trampolines are not being shared between multiple call sites to the same static method.
# Suggested Improvement
Modify the C2 compiler to share trampoline stubs between static calls that resolve to the same callee method. Since the relocation target for all static calls is initially set to the static call resolver stub, the call's target alone cannot be used to distinguish between different static method calls. Instead, trampoline stubs should be shared based on the actual callee.
## On optimized virtual calls
There may also be an opportunity to generalize this optimization to optimized virtual calls. However, this requires further investigation, particularly into how such calls are resolved and what metadata is available at compile time to facilitate safe trampoline sharing.