Details
-
Enhancement
-
Resolution: Fixed
-
P4
-
None
-
b09
-
aarch64
Description
On behalf of lxw263044@alibaba-inc.com
ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object.
Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering.
We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC.
$ grep "[50]00.*Phase: Concurrent Relocate" optimized.log
[500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms
[1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms
[1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms
[2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms
[2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms
[3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms
[3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms
$ grep "[50]00.*Phase: Concurrent Relocate" baseline.log
[500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms
[1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms
[1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms
[2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms
[2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms
[3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms
[3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms
ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object.
Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering.
We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC.
$ grep "[50]00.*Phase: Concurrent Relocate" optimized.log
[500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms
[1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms
[1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms
[2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms
[2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms
[3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms
[3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms
$ grep "[50]00.*Phase: Concurrent Relocate" baseline.log
[500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms
[1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms
[1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms
[2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms
[2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms
[3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms
[3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms