-
Enhancement
-
Resolution: Unresolved
-
P2
-
hs25, 8
Measurements on specjbb2005 show (see JDK-8020306) that G1 object copy times are much larger than parallel gc's.
One important problem is that the number of items pushed into the object copy task queue in G1 is much larger than for parallel gc. Investigation of this behavior showed that parallel gc has a special fast path handling references to already forwarded objects: it does not push them onto the work queue, but fixes them up and does any missing processing (remembered set) inline.
E.g.
inline void PSPromotionManager::claim_or_forward_internal_depth(T* p) {
if (p != NULL) { // XXX: error if p != NULL here
oop o = oopDesc::load_decode_heap_oop_not_null(p);
if (o->is_forwarded()) { // <-- "fast path" here
o = o->forwardee();
// Card mark
if (PSScavenge::is_obj_in_young(o)) {
PSScavenge::card_table()->inline_write_ref_field_gc(p, o);
}
oopDesc::encode_store_heap_oop_not_null(p, o);
} else {
push_depth(p); // push into work queue
}
}
Depending on the amount of already forwarded objects, this saves a lot of work as task queue management is relatively expensive: pushing/popping the element, possibly using locality, additional use of memory in the overflow queue, more (slow) stealing work (including possibly contended one) that is done etc.
One important problem is that the number of items pushed into the object copy task queue in G1 is much larger than for parallel gc. Investigation of this behavior showed that parallel gc has a special fast path handling references to already forwarded objects: it does not push them onto the work queue, but fixes them up and does any missing processing (remembered set) inline.
E.g.
inline void PSPromotionManager::claim_or_forward_internal_depth(T* p) {
if (p != NULL) { // XXX: error if p != NULL here
oop o = oopDesc::load_decode_heap_oop_not_null(p);
if (o->is_forwarded()) { // <-- "fast path" here
o = o->forwardee();
// Card mark
if (PSScavenge::is_obj_in_young(o)) {
PSScavenge::card_table()->inline_write_ref_field_gc(p, o);
}
oopDesc::encode_store_heap_oop_not_null(p, o);
} else {
push_depth(p); // push into work queue
}
}
Depending on the amount of already forwarded objects, this saves a lot of work as task queue management is relatively expensive: pushing/popping the element, possibly using locality, additional use of memory in the overflow queue, more (slow) stealing work (including possibly contended one) that is done etc.
- relates to
-
JDK-6672778 G1 should trim task queues more aggressively during evacuation pauses
-
- Resolved
-
-
JDK-8244684 G1 abuses StarTask to also include partial objarray scan tasks
-
- Resolved
-
-
JDK-8245022 ParallelGC abuses StarTask to also include partial objarray scan tasks
-
- Resolved
-
-
JDK-8246718 ParallelGC should not check for forward objects for copy task queue
-
- Resolved
-