-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
None
-
Affects Version/s: None
-
Component/s: hotspot
Noticed this while looking at Late Barrier Expansion work.
Shenandoah C2 clone barrier is inserted before calling into arraycopy stub:
void ShenandoahBarrierSetC2::clone_at_expansion(PhaseMacroExpand* phase, ArrayCopyNode* ac) const {
...
// Heap is unstable, call into clone barrier stub
Node* call = phase->make_leaf_call(unstable_ctrl, mem,
ShenandoahBarrierSetC2::clone_barrier_Type(),
CAST_FROM_FN_PTR(address, ShenandoahRuntime::clone_barrier),
"shenandoah_clone",
TypeRawPtr::BOTTOM,
src_base);
call = phase->transform_later(call);
...
// Wire up the actual arraycopy stub now
ctrl = phase->transform_later(region);
mem = phase->transform_later(mem_phi);
const char* name = "arraycopy";
call = phase->make_leaf_call(ctrl, mem,
OptoRuntime::fast_arraycopy_Type(),
phase->basictype2arraycopy(T_LONG, nullptr, nullptr, true, name, true),
name, TypeRawPtr::BOTTOM,
src, dest, length
LP64_ONLY(COMMA phase->top()));
call = phase->transform_later(call);
The following arraycopy call is doing T_LONG copy. But this dance looks unnecessary, because arraycopy stub *itself* calls into BarrierSetAssembler::arraycopy_prologue, which Shenandoah handles! See:
address StubGenerator::generate_conjoint_copy_avx3_masked(StubId stub_id, address* entry, address nooverlap_target) {
...
BarrierSetAssembler *bs = BarrierSet::barrier_set()->barrier_set_assembler();
bs->arraycopy_prologue(_masm, decorators, type, from, to, count);
There are also other ways GCs handle the clones, including having the full runtime clone routine that applies both relevant clone barriers and do the copy.
WIP: https://github.com/openjdk/jdk/compare/master...shipilev:JDK-8376749-shenandoah-no-c2-clone
Shenandoah C2 clone barrier is inserted before calling into arraycopy stub:
void ShenandoahBarrierSetC2::clone_at_expansion(PhaseMacroExpand* phase, ArrayCopyNode* ac) const {
...
// Heap is unstable, call into clone barrier stub
Node* call = phase->make_leaf_call(unstable_ctrl, mem,
ShenandoahBarrierSetC2::clone_barrier_Type(),
CAST_FROM_FN_PTR(address, ShenandoahRuntime::clone_barrier),
"shenandoah_clone",
TypeRawPtr::BOTTOM,
src_base);
call = phase->transform_later(call);
...
// Wire up the actual arraycopy stub now
ctrl = phase->transform_later(region);
mem = phase->transform_later(mem_phi);
const char* name = "arraycopy";
call = phase->make_leaf_call(ctrl, mem,
OptoRuntime::fast_arraycopy_Type(),
phase->basictype2arraycopy(T_LONG, nullptr, nullptr, true, name, true),
name, TypeRawPtr::BOTTOM,
src, dest, length
LP64_ONLY(COMMA phase->top()));
call = phase->transform_later(call);
The following arraycopy call is doing T_LONG copy. But this dance looks unnecessary, because arraycopy stub *itself* calls into BarrierSetAssembler::arraycopy_prologue, which Shenandoah handles! See:
address StubGenerator::generate_conjoint_copy_avx3_masked(StubId stub_id, address* entry, address nooverlap_target) {
...
BarrierSetAssembler *bs = BarrierSet::barrier_set()->barrier_set_assembler();
bs->arraycopy_prologue(_masm, decorators, type, from, to, count);
There are also other ways GCs handle the clones, including having the full runtime clone routine that applies both relevant clone barriers and do the copy.
WIP: https://github.com/openjdk/jdk/compare/master...shipilev:JDK-8376749-shenandoah-no-c2-clone