This bug has finally been chased to ground. As expected, the bug is a race condition that is only present in certain configurations. This note is an attempt to describe the race and the conditions under which the race can occur. This race is due to a bug in ObjectSynchronizer::quick_enter() which is a C2 function that was added by the "fast enter" bucket for the Contended Locking project. See: JDK-8061553 Contended Locking fast enter bucket so this bug is only present in configurations that use the Server VM (C2); configurations that use the Client VM (C1) will not observe this bug. Secondarily, Biased Locking must be enabled for the race condition to manifest. By default Biased Locking is enabled at 4 seconds so any hangs seen where the VM uptime is less than 4 seconds are not likely to be due to this bug. Lastly, there must be contention on the Java Monitor in question so there must be two or more threads using the Java Monitor that has been observed as "stranded". Here's the conditions above in check list form with a few additional conditions: - Server Compiler/C2 is in use - Biased Locking enabled (VM uptime >= 4 seconds) - Java Monitor contention - Without special options, this hang should only be observed in JDK9-B53 -> JDK9-B63; JDK-8061553 was promoted in JDK9-B53 and the fix to disable it (JDK-8079359) was promoted in JDK9-B64. - So if your hang occurred before JDK9-B53 or in JDK9-B64 or later, then this bug is not likely the cause. If you think you have a hang that is caused by this bug, then use the following diagnostic options: -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1 The 'VerifyMatch=1' portion of the above diagnostic options will cause output like the following when you've run into this bug: INFO: unexpected locked object: - locked <0xfffffd7be95defe0> (a java.util.stream.Nodes$CollectorTask$OfDouble) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=19281, tid=95 # fatal error: exiting JavaThread=0x0000000004278800 unexpectedly owns ObjectMonitor=0x00000000016f2000 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_18_18_43-b00) The diagnostic output above shows: - the unexpected locked object (0xfffffd7be95defe0) - the object's type (java.util.stream.Nodes$CollectorTask$OfDouble) - the thread that owns the lock (0x0000000004278800), - and the ObjectMonitor (0x00000000016f2000). Please note that mis-behaving programs that use JNI locking can also run into this diagnostic trap so I recommend careful use of these diagnostic options. Gory Code Details: ## ## JavaThread1 (JT1) - Part 1 ## The first JavaThread (JT1) in the race is executing this code (when the -XX:-UseOptoBiasInlining is specified): src/cpu/x86/vm/macroAssembler_x86.cpp: int MacroAssembler::biased_locking_enter(Register lock_reg, Register obj_reg, Register swap_reg, Register tmp_reg, bool swap_reg_contains_mark, Label& done, Label* slow_case, BiasedLockingCounters* counters) { movptr(tmp_reg, swap_reg); andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place); cmpptr(tmp_reg, markOopDesc::biased_lock_pattern); jcc(Assembler::notEqual, cas_label); // The bias pattern is present in the object's header. Need to check // whether the bias owner and the epoch are both still current. Note: When UseOptoBiasInlining is enabled (the default), biased_locking_enter() is not used and the C2 ideal graph version of the algorithm is used; for this note, -XX:-UseOptoBiasInlining is used because it's easier to explain biased_locking_enter()'s assembly code than the C2 ideal graph code. See PhaseMacroExpand::expand_lock_node() for the C2 ideal graph code. ## ## JavaThread2 (JT2) - Part 1 ## The second JavaThread (JT2) is inflating the JavaMonitor associated with this object so it is here (for example): src/share/vm/runtime/synchronizer.cpp: void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) { lock->set_displaced_header(markOopDesc::unused_mark()); ObjectSynchronizer::inflate(THREAD, obj(), inflate_cause_monitor_enter)->enter(THREAD); Note: Don't be confused by the call to "lock->set_displaced_header(markOopDesc::unused_mark())" above; that's the BasicLock in JT2's context. JT2 has finished the inflation part using the Biased Locking "CASE: neutral" code: src/share/vm/runtime/synchronizer.cpp ObjectMonitor* ObjectSynchronizer::inflate(Thread * Self, oop object, const InflateCause cause) { // CASE: neutral // TODO-FIXME: for entry we currently inflate and then try to CAS _owner. // If we know we're inflating for entry it's better to inflate by swinging a // pre-locked objectMonitor pointer into the object header. A successful // CAS inflates the object *and* confers ownership to the inflating thread. // In the current implementation we use a 2-step mechanism where we CAS() // to inflate and then CAS() again to try to swing _owner from NULL to Self. // An inflateTry() method that we could call from fast_enter() and slow_enter() // would be useful. assert(mark->is_neutral(), "invariant"); ObjectMonitor * m = omAlloc(Self); // prepare m for installation - set monitor to initial state m->Recycle(); m->set_header(mark); m->set_owner(NULL); and is now racing with JT1 for ownership of the Java Monitor in ObjectMonitor::enter(). For our failure mode, JT2 loses the race to JT1. ## ## JavaThread1 (JT1) - Part 2 ## src/cpu/x86/vm/macroAssembler_x86.cpp: int MacroAssembler::biased_locking_enter(Register lock_reg, andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place); cmpptr(tmp_reg, markOopDesc::biased_lock_pattern); jcc(Assembler::notEqual, cas_label); // The bias pattern is present in the object's header. Need to check // whether the bias owner and the epoch are both still current. if (swap_reg_contains_mark) { null_check_offset = offset(); } load_prototype_header(tmp_reg, obj_reg); orptr(tmp_reg, r15_thread); xorptr(tmp_reg, swap_reg); Register header_reg = tmp_reg; andptr(header_reg, ~((int) markOopDesc::age_mask_in_place)); if (counters != NULL) { cond_inc32(Assembler::zero, ExternalAddress((address) counters->biased_lock_entry_count_addr())); } jcc(Assembler::equal, done); // At this point we know that the header has the bias pattern and // that we are not the bias owner in the current epoch. We need to // figure out more details about the state of the header in order to // know what operations can be legally performed on the object's // header. // If the low three bits in the xor result aren't clear, that means // the prototype header is no longer biased and we have to revoke // the bias on this object. testptr(header_reg, markOopDesc::biased_lock_mask_in_place); jccb(Assembler::notZero, try_revoke_bias); // Biasing is still enabled for this data type. See whether the // epoch of the current bias is still valid, meaning that the epoch // bits of the mark word are equal to the epoch bits of the // prototype header. (Note that the prototype header's epoch bits // only change at a safepoint.) If not, attempt to rebias the object // toward the current thread. Note that we must be absolutely sure // that the current epoch is invalid in order to do this because // otherwise the manipulations it performs on the mark word are // illegal. testptr(header_reg, markOopDesc::epoch_mask_in_place); jccb(Assembler::notZero, try_rebias); // The epoch of the current bias is still valid but we know nothing // about the owner; it might be set or it might be clear. Try to // acquire the bias of the object using an atomic operation. If this // fails we will go in to the runtime to revoke the object's bias. // Note that we first construct the presumed unbiased header so we // don't accidentally blow away another thread's valid bias. NOT_LP64( movptr(swap_reg, saved_mark_addr); ) andptr(swap_reg, markOopDesc::biased_lock_mask_in_place | markOopDesc::age_mask_in_place | markOopDesc::epoch_mask_in_place); #ifdef _LP64 movptr(tmp_reg, swap_reg); orptr(tmp_reg, r15_thread); #else get_thread(tmp_reg); orptr(tmp_reg, swap_reg); #endif if (os::is_MP()) { lock(); } cmpxchgptr(tmp_reg, mark_addr); // compare tmp_reg and swap_reg // If the biasing toward our thread failed, this means that // another thread succeeded in biasing it toward itself and we // need to revoke that bias. The revocation will occur in the // interpreter runtime in the slow case. if (counters != NULL) { cond_inc32(Assembler::zero, ExternalAddress((address) counters->anonymously_biased_lock_entry_count_addr())); } if (slow_case != NULL) { jcc(Assembler::notZero, *slow_case); } jmp(done); src/share/vm/runtime/synchronizer.cpp: bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self, BasicLock * Lock) { if (mark->has_monitor()) { if (owner == Self) { if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { <_displaced_header field value to something other than NULL. This> assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); return true; } To recap: - JT1 calls C2 fast_lock() which calls biased_locking_enter() - JT2 inflates the Java Monitor - JT1 bails biased_locking_enter() after making it past the first check which results in an early bail from fast_lock() - JT1 makes a slow path call to complete_monitor_enter_C() - JT1 makes a last ditch call to quick_enter() before doing the real slow path work The early bail code path in biased_locking_enter() and fast_lock() results in the BasicLock's _displaced_header field value remaining NULL which marks this entry as recursive. If JT2's inflation had happened a little earlier, then JT1 would have taken the first bail point in biased_locking_enter() which would have resulted in a regular fast_lock() code path which does initialize the BasicLock's _displaced_header field. The fix for this problem is a 1-liner: --- a/src/share/vm/runtime/synchronizer.cpp +++ b/src/share/vm/runtime/synchronizer.cpp @@ -229,6 +229,9 @@ bool ObjectSynchronizer::quick_enter(oop if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { + // Make the displaced header non-NULL so this BasicLock is + // not seen as recursive. + Lock->set_displaced_header(markOopDesc::unused_mark()); assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); return true; So when quick_enter() succeeds at its last ditch optimization, it needs to mark the BasicLock's _displaced_header field with a non-zero value (like the other lock grabbing code paths).