-
Enhancement
-
Resolution: Fixed
-
P4
-
9
-
b21
During recent work the following worthwhile micro-optimizations for scanning remembered sets (or in general, cards) have been found:
- HeapRegion::oops_on_card_seq_iterate_careful is faster than using HeapRegionDCTOC during scan rs.
- HeapRegion::oops_on_card_seq_iterate_careful can be sped up by allowing for specialization for the use cases during gc vs. during mutator time by specialization.
E.g. a lot of extra checks can go away for such a specialization, like the filter_young one, the g1h->is_gc_active(), the card_ptr != NULL, the various checks whether we are scanning into an unparseable point etc.
- HeapRegion::oops_on_card_seq_iterate_careful() always does at least one unnecessary call to HeapRegion::block_size().
I.e. the one done while positioning the cursor at the object starting at or spanning into the card in question is not reused in the entry of the iteration loop.
HeapRegion::block_size() is very expensive in G1.
- one can aggressively specialize HeapRegion::block_size() for the use case during gc:
- addr can not be >= top(), dropping the check
- the repeated calculation of g1h->concurrent_mark()->prevMarkBitMap() is very expensive. Its load should be hoisted out of the oops_on_card_seq_iterate_careful() main loop and passed in from a local variable.
- further, the information that the object is dead should be returned from block_size() (or a specialized one). After determining block_size(), oops_on_card_iterate() again does an expensive lookup of the prev mark bitmap to check whether the object is dead and looks up the mark bitmap again.
- need to look at the called methods, if it is appropriate to make them more amenable to inlining (some short, called methods are in cpp files)
- HeapRegion::block_is_obj() could be aggressively specialized for RS scan too: the first check for whether the given address is in a continues humongous region can be hoisted out of the entire oop iteration loop into oops_on_card_seq_iterate_careful();
- HeapRegion::is_obj_dead() could be specialized too: e.g. the is_archive check can be hoisted out to top-level (and actually, since archive regions do not contain any references to non-archive regions) is superfluous
- HeapRegion::oops_on_card_seq_iterate_careful is faster than using HeapRegionDCTOC during scan rs.
- HeapRegion::oops_on_card_seq_iterate_careful can be sped up by allowing for specialization for the use cases during gc vs. during mutator time by specialization.
E.g. a lot of extra checks can go away for such a specialization, like the filter_young one, the g1h->is_gc_active(), the card_ptr != NULL, the various checks whether we are scanning into an unparseable point etc.
- HeapRegion::oops_on_card_seq_iterate_careful() always does at least one unnecessary call to HeapRegion::block_size().
I.e. the one done while positioning the cursor at the object starting at or spanning into the card in question is not reused in the entry of the iteration loop.
HeapRegion::block_size() is very expensive in G1.
- one can aggressively specialize HeapRegion::block_size() for the use case during gc:
- addr can not be >= top(), dropping the check
- the repeated calculation of g1h->concurrent_mark()->prevMarkBitMap() is very expensive. Its load should be hoisted out of the oops_on_card_seq_iterate_careful() main loop and passed in from a local variable.
- further, the information that the object is dead should be returned from block_size() (or a specialized one). After determining block_size(), oops_on_card_iterate() again does an expensive lookup of the prev mark bitmap to check whether the object is dead and looks up the mark bitmap again.
- need to look at the called methods, if it is appropriate to make them more amenable to inlining (some short, called methods are in cpp files)
- HeapRegion::block_is_obj() could be aggressively specialized for RS scan too: the first check for whether the given address is in a continues humongous region can be hoisted out of the entire oop iteration loop into oops_on_card_seq_iterate_careful();
- HeapRegion::is_obj_dead() could be specialized too: e.g. the is_archive check can be hoisted out to top-level (and actually, since archive regions do not contain any references to non-archive regions) is superfluous
- relates to
-
JDK-8017163 G1: Refactor remembered sets
-
- Resolved
-
-
JDK-8166607 G1 needs klass_or_null_acquire
-
- Resolved
-
-
JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses
-
- Closed
-
-
JDK-8166995 Consider removing stale cards from HCC during cleanup
-
- Closed
-