-
Enhancement
-
Resolution: Unresolved
-
P4
-
None
-
17, 21, 23, 24
-
generic
-
generic
ShenandoahRegionIterator is thread-safe iterator using atomic, with it Shenandoah GC workers could atomically balance the the work no matter if the cost to process a region is same or not, the worker thread takes more expensive region will process less regions.
But if the task is super lightweight and same/similar for all regions, the overhead of ShenandoahRegionIterator using atomic might be too expensive, in such case we could use parallel_heap_region_iterate instead.
Candidates to optimize:
ShenandoahResetBitmapTask/ShenandoahMCResetCompleteBitmapTask, and maybe ShenandoahPretouchBitmapTask
A crude path like this https://github.com/pengxiaolong/jdk/compare/JDK-8336640-auto-derive-stride...pengxiaolong:jdk:optimize-region-bitmap-reset?expand=1 shows good improvement:
2048 regions:
Before: [15.807s][info][gc,stats ] Concurrent Reset = 0.022 s (a = 117 us) (n = 189) (lvls, us = 84, 102, 105, 111, 364)
After: [15.814s][info][gc,stats ] Concurrent Reset = 0.019 s (a = 103 us) (n = 187) (lvls, us = 74, 89, 92, 98, 331)
4096 regions:
Before: [15.810s][info][gc,stats ] Concurrent Reset = 0.035 s (a = 188 us) (n = 187) (lvls, us = 162, 178, 184, 193, 251)
After: [15.807s][info][gc,stats ] Concurrent Reset = 0.022 s (a = 115 us) (n = 187) (lvls, us = 83, 100, 104, 109, 248)
But if the task is super lightweight and same/similar for all regions, the overhead of ShenandoahRegionIterator using atomic might be too expensive, in such case we could use parallel_heap_region_iterate instead.
Candidates to optimize:
ShenandoahResetBitmapTask/ShenandoahMCResetCompleteBitmapTask, and maybe ShenandoahPretouchBitmapTask
A crude path like this https://github.com/pengxiaolong/jdk/compare/JDK-8336640-auto-derive-stride...pengxiaolong:jdk:optimize-region-bitmap-reset?expand=1 shows good improvement:
2048 regions:
Before: [15.807s][info][gc,stats ] Concurrent Reset = 0.022 s (a = 117 us) (n = 189) (lvls, us = 84, 102, 105, 111, 364)
After: [15.814s][info][gc,stats ] Concurrent Reset = 0.019 s (a = 103 us) (n = 187) (lvls, us = 74, 89, 92, 98, 331)
4096 regions:
Before: [15.810s][info][gc,stats ] Concurrent Reset = 0.035 s (a = 188 us) (n = 187) (lvls, us = 162, 178, 184, 193, 251)
After: [15.807s][info][gc,stats ] Concurrent Reset = 0.022 s (a = 115 us) (n = 187) (lvls, us = 83, 100, 104, 109, 248)