-
Bug
-
Resolution: Unresolved
-
P3
-
16
On the jruby bug tracker there is a bug report about later JDKs 20% slower than latest (e.g. JDK 14). (https://github.com/jruby/jruby/issues/5789 via https://twitter.com/headius/status/1297992914832769024).
The main reason is the change of the default GC in JDK9; however the difference is abnormally high so reporting it here. The typical observed difference for known outliers is around 10%.
After some tuning, i.e. setting -Xms == -Xmx, using 32M regions, the difference can be tuned a bit to ~13-15% difference.
One suspicion are the barriers as reported by [~shade] (in that bug report):
"Tested with recent JDK 13 EA and multiple collectors. Judging from GC logs, it is heavily-allocating, but fairly young-gc workload. Both Parallel and G1 run very short Young GCs during the run, taking about 1% of total time, which means allocation pressure itself is not the issue here."
Local results:
# score [% of options
baseline]
1 parallel 17,26 100,0% -Xmx1500m (oob)
2 g1 13,64 79,0% -Xmx1500m (oob)
3 parallel 17,16 100,0% -Xmx1500m -Xms1500m -Xmn1000m
4 g1 13,99 81,5% -Xmx1500m -Xms1500m -Xmn1000m
5 g1 14,36 83,7% -Xmx1500m -Xms1500m -Xmn1000m (rerun)
6 g1 15,13 88,2% -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
7 g1 14,90 86,8% -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m (rerun)
8 parallel 13,81 100,0% graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
9 g1 13,11 94,9% graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
The interesting runs are 8 and 9, with graal. Seems like it's slower overall, but it also does not show a big difference (5%) in performance. So potentially there is an issue with C2 optimizations that only kicks in with Parallel GC's (small) barriers.
Some initial playing with -XX:MaxInlineSize and -XX:FreqInlineSize did not yield interesting results.
Reproduction:
* Download JRuby from https://www.jruby.org/download
* Clone https://github.com/PragTob/rubykon
* Run jruby -Xcompile.invokedynamic=true -J-Xmx1500m benchmark/mcts_avg.rb
JRuby will pick up the VM pointed to by JAVA_HOME; you can check which with "jruby -v".
The main reason is the change of the default GC in JDK9; however the difference is abnormally high so reporting it here. The typical observed difference for known outliers is around 10%.
After some tuning, i.e. setting -Xms == -Xmx, using 32M regions, the difference can be tuned a bit to ~13-15% difference.
One suspicion are the barriers as reported by [~shade] (in that bug report):
"Tested with recent JDK 13 EA and multiple collectors. Judging from GC logs, it is heavily-allocating, but fairly young-gc workload. Both Parallel and G1 run very short Young GCs during the run, taking about 1% of total time, which means allocation pressure itself is not the issue here."
Local results:
# score [% of options
baseline]
1 parallel 17,26 100,0% -Xmx1500m (oob)
2 g1 13,64 79,0% -Xmx1500m (oob)
3 parallel 17,16 100,0% -Xmx1500m -Xms1500m -Xmn1000m
4 g1 13,99 81,5% -Xmx1500m -Xms1500m -Xmn1000m
5 g1 14,36 83,7% -Xmx1500m -Xms1500m -Xmn1000m (rerun)
6 g1 15,13 88,2% -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
7 g1 14,90 86,8% -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m (rerun)
8 parallel 13,81 100,0% graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
9 g1 13,11 94,9% graal -Xmx1500m -Xms1500m -Xmn1000m -XX:G1HeapRegionSize=32m
The interesting runs are 8 and 9, with graal. Seems like it's slower overall, but it also does not show a big difference (5%) in performance. So potentially there is an issue with C2 optimizations that only kicks in with Parallel GC's (small) barriers.
Some initial playing with -XX:MaxInlineSize and -XX:FreqInlineSize did not yield interesting results.
Reproduction:
* Download JRuby from https://www.jruby.org/download
* Clone https://github.com/PragTob/rubykon
* Run jruby -Xcompile.invokedynamic=true -J-Xmx1500m benchmark/mcts_avg.rb
JRuby will pick up the VM pointed to by JAVA_HOME; you can check which with "jruby -v".
- relates to
-
JDK-8132937 G1 compares badly to Parallel GC on throughput on javac benchmark
- Open
-
JDK-8226197 Reduce G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement
- Open
-
JDK-8226731 Remove StoreLoad in G1 post barrier
- Open
-
JDK-8133055 Investigate G1 performance on SPL4
- Closed
-
JDK-8340827 Reduce Latency of G1 Post-Write Barrier
- Draft