Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8020272

On OVM/Solaris 2.2.x gethrtime returns a too large value - causing the JVM to hang

XMLWordPrintable

        The VM hangs on Solaris/OVM 2.2.x. It caused by underlying OVM issue and could not be fixed in VM.

        This issues should be documented.


        I've observed a hang in HotSpot when running bigapps/Kitchensink. The hangs happen from different paths that calls os_sleep in os_solaris.cpp, such as: safepointing, GC, java.lang.Thread.sleep.

        I've narrowed this down to a call to gethrtime, in getTimeNanos, that suddenly returns a value that is twice as large as previous values. When this happens all subsequent calls to getTimeNanos will return the same value, causing the JVM to spin in os_sleep.

        See:
        inline hrtime_t getTimeNanos() {
          if (VM_Version::supports_cx8()) {
            const hrtime_t now = gethrtime();
            // Use atomic long load since 32-bit x86 uses 2 registers to keep long.
            const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
            if (now <= prev) return prev; // same or retrograde time;
            const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
            assert(obsv >= prev, "invariant"); // Monotonicity
            // If the CAS succeeded then we're done and return "now".
            // If the CAS failed and the observed value "obs" is >= now then
            // we should return "obs". If the CAS failed and now > obs > prv then
            // some other thread raced this thread and installed a new value, in which case
            // we could either (a) retry the entire operation, (b) retry trying to install now
            // or (c) just return obs. We use (c). No loop is required although in some cases
            // we might discard a higher "now" value in deference to a slightly lower but freshly
            // installed obs value. That's entirely benign -- it admits no new orderings compared
            // to (a) or (b) -- and greatly reduces coherence traffic.
            // We might also condition (c) on the magnitude of the delta between obs and now.
            // Avoiding excessive CAS operations to hot RW locations is critical.
            // See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
            return (prev == obsv) ? now : obsv ;
          }

        I've added this guarantee after the cmpxchg:
            guarantee(now - prev < 100000000000L, err_msg("getTimeNanos to big delta prev: " JLONG_FORMAT " obsv: " JLONG_FORMAT " now: " JLONG_FORMAT " thread: " PTR_FORMAT, prev, obsv, now, Thread::current()));
         
        And I hit the following assert:
        guarantee(now - prev < 100000000000L) failed: getTimeNanos to big delta prev: 1312688670002335 obsv: 1312688670002335 now: 2625377340677208 thread: 0x0000000001a1d000

        This is probably an OS/Virtual Machine bug, but it affects the JVM.

        $ uname -a
        SunOS slc05erh 5.11 11.1 i86pc i386 i86pc

              dgollapudi Devika Gollapudi (Inactive)
              stefank Stefan Karlsson
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: