- 
    Enhancement 
- 
    Resolution: Fixed
- 
     P4 P4
- 
    9
- 
        b21
                    During recent work the following worthwhile micro-optimizations for scanning remembered sets (or in general, cards) have been found:
- HeapRegion::oops_on_card_seq_iterate_careful is faster than using HeapRegionDCTOC during scan rs.
- HeapRegion::oops_on_card_seq_iterate_careful can be sped up by allowing for specialization for the use cases during gc vs. during mutator time by specialization.
E.g. a lot of extra checks can go away for such a specialization, like the filter_young one, the g1h->is_gc_active(), the card_ptr != NULL, the various checks whether we are scanning into an unparseable point etc.
- HeapRegion::oops_on_card_seq_iterate_careful() always does at least one unnecessary call to HeapRegion::block_size().
I.e. the one done while positioning the cursor at the object starting at or spanning into the card in question is not reused in the entry of the iteration loop.
HeapRegion::block_size() is very expensive in G1.
- one can aggressively specialize HeapRegion::block_size() for the use case during gc:
- addr can not be >= top(), dropping the check
- the repeated calculation of g1h->concurrent_mark()->prevMarkBitMap() is very expensive. Its load should be hoisted out of the oops_on_card_seq_iterate_careful() main loop and passed in from a local variable.
- further, the information that the object is dead should be returned from block_size() (or a specialized one). After determining block_size(), oops_on_card_iterate() again does an expensive lookup of the prev mark bitmap to check whether the object is dead and looks up the mark bitmap again.
- need to look at the called methods, if it is appropriate to make them more amenable to inlining (some short, called methods are in cpp files)
- HeapRegion::block_is_obj() could be aggressively specialized for RS scan too: the first check for whether the given address is in a continues humongous region can be hoisted out of the entire oop iteration loop into oops_on_card_seq_iterate_careful();
- HeapRegion::is_obj_dead() could be specialized too: e.g. the is_archive check can be hoisted out to top-level (and actually, since archive regions do not contain any references to non-archive regions) is superfluous
            
- HeapRegion::oops_on_card_seq_iterate_careful is faster than using HeapRegionDCTOC during scan rs.
- HeapRegion::oops_on_card_seq_iterate_careful can be sped up by allowing for specialization for the use cases during gc vs. during mutator time by specialization.
E.g. a lot of extra checks can go away for such a specialization, like the filter_young one, the g1h->is_gc_active(), the card_ptr != NULL, the various checks whether we are scanning into an unparseable point etc.
- HeapRegion::oops_on_card_seq_iterate_careful() always does at least one unnecessary call to HeapRegion::block_size().
I.e. the one done while positioning the cursor at the object starting at or spanning into the card in question is not reused in the entry of the iteration loop.
HeapRegion::block_size() is very expensive in G1.
- one can aggressively specialize HeapRegion::block_size() for the use case during gc:
- addr can not be >= top(), dropping the check
- the repeated calculation of g1h->concurrent_mark()->prevMarkBitMap() is very expensive. Its load should be hoisted out of the oops_on_card_seq_iterate_careful() main loop and passed in from a local variable.
- further, the information that the object is dead should be returned from block_size() (or a specialized one). After determining block_size(), oops_on_card_iterate() again does an expensive lookup of the prev mark bitmap to check whether the object is dead and looks up the mark bitmap again.
- need to look at the called methods, if it is appropriate to make them more amenable to inlining (some short, called methods are in cpp files)
- HeapRegion::block_is_obj() could be aggressively specialized for RS scan too: the first check for whether the given address is in a continues humongous region can be hoisted out of the entire oop iteration loop into oops_on_card_seq_iterate_careful();
- HeapRegion::is_obj_dead() could be specialized too: e.g. the is_archive check can be hoisted out to top-level (and actually, since archive regions do not contain any references to non-archive regions) is superfluous
- relates to
- 
                    JDK-8017163 G1: Refactor remembered sets -           
- Resolved
 
-         
- 
                    JDK-8166607 G1 needs klass_or_null_acquire -           
- Resolved
 
-         
- 
                    JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses -           
- Closed
 
-         
- 
                    JDK-8166995 Consider removing stale cards from HCC during cleanup -           
- Closed
 
-