After the quick fix [JDK-8297763](https://bugs.openjdk.org/browse/JDK-8297763), shared trampoline logic gets a bit verbose. If we can turn to batch emission of trampoline stubs, pre-calculating the total size, and pre-allocating them, then we can remove the CodeBuffer expansion checks each time and clean up the code around.
```
[Stub Code]
...
<shared trampoline stub1, (A):>
__ align() // emit nothing or a 4-byte padding
<-- (B) multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
<shared trampoline stub2, (C):>
__ align() // emit nothing or a 4-byte padding
<-- multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
<shared trampoline stub3:>
__ align() // emit nothing or a 4-byte padding
<-- multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
```
Here, the `pc_at_(C) - pc_at_(B)` is the fixed length `NativeCallTrampolineStub::instruction_size`; but the `pc_at_(B) - pc_at_(A)` may be a 0 or 4, which is not a fixed-length value.
So Originally:
The logic of the lambda `emit` inside the `emit_shared_trampolines()` when emitting a shared trampoline:
```
We are at (A) ->
do an align() ->
We are at (B) ->
emit lots of relocations bound to this shared trampoline at (B) ->
do an emit_trampoline_stub() ->
We are at (C)
```
After this patch:
```
We are at (A) ->
do an emit_trampoline_stub(), which contains an align() already ->
We are at (C) directly ->
reversely calculate the (B) address, for `pc_at_(C) - pc_at_(B)` is a fixed-length value ->
emit lots of relocations bound to this shared trampoline at (B)
```
Theoretically the same. Just a code refactoring and we can remove some checks inside and make the code clean.
Tested AArch64 hotspot tier1\~4 with fastdebug build twice; Tested RISC-V hotspot tier1\~4 with fastdebug build on hardware once.
```
[Stub Code]
...
<shared trampoline stub1, (A):>
__ align() // emit nothing or a 4-byte padding
<-- (B) multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
<shared trampoline stub2, (C):>
__ align() // emit nothing or a 4-byte padding
<-- multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
<shared trampoline stub3:>
__ align() // emit nothing or a 4-byte padding
<-- multiple relocations at the pc: __ relocate(<the pc here>, trampoline_stub_Relocation::spec())
__ ldr()
__ br()
__ emit_int64()
```
Here, the `pc_at_(C) - pc_at_(B)` is the fixed length `NativeCallTrampolineStub::instruction_size`; but the `pc_at_(B) - pc_at_(A)` may be a 0 or 4, which is not a fixed-length value.
So Originally:
The logic of the lambda `emit` inside the `emit_shared_trampolines()` when emitting a shared trampoline:
```
We are at (A) ->
do an align() ->
We are at (B) ->
emit lots of relocations bound to this shared trampoline at (B) ->
do an emit_trampoline_stub() ->
We are at (C)
```
After this patch:
```
We are at (A) ->
do an emit_trampoline_stub(), which contains an align() already ->
We are at (C) directly ->
reversely calculate the (B) address, for `pc_at_(C) - pc_at_(B)` is a fixed-length value ->
emit lots of relocations bound to this shared trampoline at (B)
```
Theoretically the same. Just a code refactoring and we can remove some checks inside and make the code clean.
Tested AArch64 hotspot tier1\~4 with fastdebug build twice; Tested RISC-V hotspot tier1\~4 with fastdebug build on hardware once.