This bug was created on behalf of Kevin <kuaiwei.kw@alibaba-inc.com>.
------
Hi,
Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected.
I wrote a simple test case to store oop in initialize function or just after init function .
public class StoreTest {
static String val="x";
public static Foo testMethod() {
Foo newfoo = new Foo(val);
newfoo.b=val; // the store barrier could be reduced
return newfoo;
}
public static void main(String []args) {
Foo obj = new Foo(val); // init Foo class
testMethod();
}
static class Foo {
Object a;
Object b;
public Foo(Object val) {
this.a=val; // the store barrier could be reduced
};
}
}
I inline Foo:<init> and Object::<init> when compile testMethod by C2, so I think the 2 store marked red don't need post barrier. But I still found post barrier in generated assembly code.
The test command: java -Xcomp -Xbatch -XX:+UseG1GC -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining StoreTest
compile_command:
compileonly, StoreTest::testMethod
compileonly, StoreTest$Foo::<init>
inline, StoreTest$Foo::<init>
compileonly, java.lang.Object::<init>
inline, java.lang.Object::<init>
print, StoreTest::testMethod
I checked the node graph in parsing phase. The optimization depends on GraphKit::just_allocated_object to detect new allocate object. The idea is to check control of store is control proj of allocation. But in parse phase , there's a Region node between control proj and control of store. The region just has one input edge. So it could be optimized later. The region node is generated when C2 inline init method of super class, I think it's used in exit map to merge all exit path.
The change is simple, in just_allocated_object, I checked if there's region node with only 1 input. With the change, we can see good performance improvement in pressure test.
Could you check the change and give comments about it?
graphKit.cpp
// We use this to determine if an object is so "fresh" that
// it does not require card marks.
Node* GraphKit::just_allocated_object(Node* current_control) {
- if (C->recent_alloc_ctl() == current_control)
+ Node * ctrl = current_control;
+ if (CheckJustAllocatedAggressive) {
+ // Object::<init> is invoked after allocation, most of invoke nodes
+ // will be reduced, but a region node is kept in parse time, we check
+ // the pattern and skip the region node
+ if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
+ ctrl = ctrl->in(1);
+ }
+ }
+ if (C->recent_alloc_ctl() == ctrl)
return C->recent_alloc_obj();
return NULL;
}
Thanks,
Kevin
------
Hi,
Recently I checked the optimization of reducing G1 post barrier for new allocated object. But I found it doesn't work as expected.
I wrote a simple test case to store oop in initialize function or just after init function .
public class StoreTest {
static String val="x";
public static Foo testMethod() {
Foo newfoo = new Foo(val);
newfoo.b=val; // the store barrier could be reduced
return newfoo;
}
public static void main(String []args) {
Foo obj = new Foo(val); // init Foo class
testMethod();
}
static class Foo {
Object a;
Object b;
public Foo(Object val) {
this.a=val; // the store barrier could be reduced
};
}
}
I inline Foo:<init> and Object::<init> when compile testMethod by C2, so I think the 2 store marked red don't need post barrier. But I still found post barrier in generated assembly code.
The test command: java -Xcomp -Xbatch -XX:+UseG1GC -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining StoreTest
compile_command:
compileonly, StoreTest::testMethod
compileonly, StoreTest$Foo::<init>
inline, StoreTest$Foo::<init>
compileonly, java.lang.Object::<init>
inline, java.lang.Object::<init>
print, StoreTest::testMethod
I checked the node graph in parsing phase. The optimization depends on GraphKit::just_allocated_object to detect new allocate object. The idea is to check control of store is control proj of allocation. But in parse phase , there's a Region node between control proj and control of store. The region just has one input edge. So it could be optimized later. The region node is generated when C2 inline init method of super class, I think it's used in exit map to merge all exit path.
The change is simple, in just_allocated_object, I checked if there's region node with only 1 input. With the change, we can see good performance improvement in pressure test.
Could you check the change and give comments about it?
graphKit.cpp
// We use this to determine if an object is so "fresh" that
// it does not require card marks.
Node* GraphKit::just_allocated_object(Node* current_control) {
- if (C->recent_alloc_ctl() == current_control)
+ Node * ctrl = current_control;
+ if (CheckJustAllocatedAggressive) {
+ // Object::<init> is invoked after allocation, most of invoke nodes
+ // will be reduced, but a region node is kept in parse time, we check
+ // the pattern and skip the region node
+ if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
+ ctrl = ctrl->in(1);
+ }
+ }
+ if (C->recent_alloc_ctl() == ctrl)
return C->recent_alloc_obj();
return NULL;
}
Thanks,
Kevin