Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7078565

prefer mfence on processors prior to Nehalem

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P4 P4
    • tbd
    • hs22
    • hotspot
    • None
    • x86
    • solaris_10

      I believe this was caused by the switch to using lock addl[esp], 0 instead of mfence for volatile membars, 6822204. My review request for that said that at the time I didn't measure any performance change for Intel, http://cr.openjdk.java.net/~never/6822204. On your microbenchmark I can measure the difference though so I'm going to remeasure derby which previously showed the big difference. We may want to make the lock addl be AMD specific.

      tom

      On Aug 11, 2011, at 11:05 AM, Clemens Eisserer wrote:

      Hi Vitaly,

      I tried this bench on 6u23 and if I first run that code in a 10k iteration loop and then time the 1mm iteration loop I get about 10 ms speedup. The first loop would trigger jit compilation (10k is the default threshold I believe) and second should run without compilation interruption.

      Can you try the same? Also might be interesting to time it under the interpreter (-Xint).

      I changed the testcase a bit, to no longer rely on OSR - as lockBench() will for sure soon hit the compilation threshold after a few runs.

      I get the following timings for 1m runs:

      jdk7-server: 53ms
      jdk7-client: 62ms
      jdk7-xint : 955ms

      jdk6-xint : 1000ms
      jdk6-client: 68ms
      jdk6-server: 52ms

      jdk5-server: 40ms
      jdk5-client: 61ms
      jdk5-xint : 832ms

      So JDK7 is slower in every case, the regression seems to have landed in jdk6 (I was using openjdk6).

      Should I file a bug-report about this behaviour?

      Thanks, Clemens


      public class LockPerf {
         static ReentrantLock lock = new ReentrantLock();

         public static void main(String[] args) {
          while (true) {
               long start2 = System.nanoTime();
               for(int i=0; i < 1000; i++) {
               lockBench();
             }
             System.out.println("Lock bench: " + ((System.nanoTime() - start2)) / 1000000);
         }
         }

         private static void lockBench() {
             for (int i = 0; i < 1000; i++) {
               lock.lock();
               lock.unlock();
             }
         }
      }


      On Aug 11, 2011 11:38 AM, "Clemens Eisserer" <###@###.###> wrote:
      Hi Vitaly,

      Which OS are you using?

      Linux-3.0 (Fedora 15)


      Also, you should use System.nanoTime() for this type of timing as it gives
      you a more precise timer.

      I tried, but results remained the same. ~53ms for jdk6/7, ~41 for JDK5.
      I was using the server compiler both times.

      Thanks, Clemens

            never Tom Rodriguez
            never Tom Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: