On a sample application's profiler using VM_ThreadDump, 24.52% (of VM thread) CPU cycle was spent on copying RegisterMap.
https://github.com/openjdk/jdk/blob/5dc9723c8172e288872f744bac5fd2342475767a/src/hotspot/share/runtime/vframe.cpp#L97
RegisterMap is huge as well.
/* size: 4984, cachelines: 78, members: 10 /
/ sum members: 4976, holes: 1, sum holes: 7 /
/ padding: 1 /
/ paddings: 1, sum paddings: 1 /
/ last cacheline: 56 bytes */
Perf report attached.
Pass by value seems to perform better and tier 1 / tier 2 tests passed. However, unclear if there is impact to other VM operations.
https://github.com/openjdk/jdk/blob/5dc9723c8172e288872f744bac5fd2342475767a/src/hotspot/share/runtime/vframe.cpp#L97
RegisterMap is huge as well.
/* size: 4984, cachelines: 78, members: 10 /
/ sum members: 4976, holes: 1, sum holes: 7 /
/ padding: 1 /
/ paddings: 1, sum paddings: 1 /
/ last cacheline: 56 bytes */
Perf report attached.
Pass by value seems to perform better and tier 1 / tier 2 tests passed. However, unclear if there is impact to other VM operations.