During card-scanning of young-gc, an oop-closure is required to process oops on dirty cards. When the number of dirty card is large, having an opaque closure type prevent inline of this oop-closure, resulting into performance loss.
Using the attachment contrived benchmark, one can observe >2x performance difference btw Serial and Parallel using single gc-thread.
java -Xms3g -Xmx3g <select-gc> -XX:NewSize=1g card_scan.java:
```
[0.006s][info][gc] Using Serial
[3.284s][info][gc] GC(0) Pause Young (Allocation Failure) 1843M->1026M(2969M) 973.983ms
[5.154s][info][gc] GC(1) Pause Young (Allocation Failure) 1846M->1026M(2969M) 961.275ms
[7.012s][info][gc] GC(2) Pause Young (Allocation Failure) 1846M->1026M(2969M) 960.879ms
[0.002s][info][gc] Using Parallel
[2.681s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1026M(2944M) 378.085ms
[3.875s][info][gc] GC(1) Pause Young (Allocation Failure) 1794M->1026M(2944M) 379.245ms
[5.159s][info][gc] GC(2) Pause Young (Allocation Failure) 1794M->1026M(2944M) 377.358ms
[6.337s][info][gc] GC(3) Pause Young (Allocation Failure) 1794M->1026M(2944M) 377.569ms
```
Using the attachment contrived benchmark, one can observe >2x performance difference btw Serial and Parallel using single gc-thread.
java -Xms3g -Xmx3g <select-gc> -XX:NewSize=1g card_scan.java:
```
[0.006s][info][gc] Using Serial
[3.284s][info][gc] GC(0) Pause Young (Allocation Failure) 1843M->1026M(2969M) 973.983ms
[5.154s][info][gc] GC(1) Pause Young (Allocation Failure) 1846M->1026M(2969M) 961.275ms
[7.012s][info][gc] GC(2) Pause Young (Allocation Failure) 1846M->1026M(2969M) 960.879ms
[0.002s][info][gc] Using Parallel
[2.681s][info][gc] GC(0) Pause Young (Allocation Failure) 1791M->1026M(2944M) 378.085ms
[3.875s][info][gc] GC(1) Pause Young (Allocation Failure) 1794M->1026M(2944M) 379.245ms
[5.159s][info][gc] GC(2) Pause Young (Allocation Failure) 1794M->1026M(2944M) 377.358ms
[6.337s][info][gc] GC(3) Pause Young (Allocation Failure) 1794M->1026M(2944M) 377.569ms
```