Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8218192

Remove copy constructor for MemRegion

XMLWordPrintable

    • gc
    • b09

        We observed a 5% performance regression comparing Clang-built and GCC-built HotSpot on Google's production machine with the jython benchmark in DaCapo. We identified the root cause is that LLVM's SLP vectorizer (https://llvm.org/docs/Vectorizers.html#the-slp-vectorizer) compiles G1BarrierSet::write_region() and G1BarrierSet::write_ref_array_work() methods with SSE instructions movups and movaps for passing the parameter "MemRegion mr" to G1BarrierSet::invalidate(). However, the data for the SSE move instructions is likely not aligned, resulting in the poor performance.

        Although LLVM's SLP vectorizer can be turned off with -fno-slp-vectorize, we don't think it is desirable as it may cause other performance regression with Clang. We think it is reasonable to just pass the MemRegion object by a const reference, which avoids unnecessary data movement and vectorization.

        Below are performance numbers with this patch. Experiments were done with 15 trials, and the variances for each config are within 0.5%.
        Clang version: trunk r351319
        GCC version: 4.9

        GCC-default GCC-passByRef Clang-default Clang-passByRef
        Execution Time (ms): 12151.4 12078.7 12532.8 11957.2
        Process CPU Time (ms): 12167.3 12086.7 12543.3 11975.3

        Update:
        Based on suggestion from Kim Barrett below, we think it is better to remove the copy constructor. Latest performance numbers are attach in the HTML file.

              manc Man Cao
              manc Man Cao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: