Modified the DCUBED_JME_DEBUG code in this function: src/share/vm/runtime/thread.cpp: // add Java Monitor Enter trace points from ObjectSynchronizer code void JavaThread::add_dcubed_jme_last_trace_points(oop obj, BasicLock *lock, ObjectMonitor *mon, uint64_t trace_points) { to make this check: + // quick_enter() set this bit in this call: + // Record that we grabbed the ObjectMonitor with cmpxhg() + // jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); + // + // MacroAssembler::fast_lock() set this bit before quick_enter() was called: + // Record that we returned success from fast_lock + // orptr(tracePoints, 0x00002000); + guarantee(trace_points != 0x000400000000L || + (_dcubed_jme_last_trace_points & 0x00002000L) == 0, + "fast_lock() and quick_enter() cannot both succeed!"); add_dcubed_jme_last_trace_points() is called with trace_points == 0x000400000000L for this code: src/share/vm/runtime/synchronizer.cpp: bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self, BasicLock * Lock) { if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { #ifdef DCUBED_JME_TRACE // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); #endif so the 0x000400000000L flag value records when quick_enter() successfully grabs the inflated ObjectMonitor's _owner field. This code: src/cpu/x86/vm/macroAssembler_x86.cpp void MacroAssembler::fast_lock(Register objReg, Register boxReg, Register tmpReg, Register scrReg, Register cx1Reg, Register cx2Reg, BiasedLockingCounters* counters, RTMLockingCounters* rtm_counters, RTMLockingCounters* stack_rtm_counters, Metadata* method_data, bool use_rtm, bool profile_rtm) { sets the 0x00002000 flag here: // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); // save the current trace point info for objReg // Note: This trace_fast_lock() causes a crash with slowdebug bits // near the end of the test run in deoptimization code. trace_fast_lock(objReg, scrReg, tracePoints); xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success bind(MY_DONE1); pop(tracePoints); #endif } so the 0x00002000 flag value records that MacroAssembler::fast_lock() has returned success. It should never be possible for both MacroAssembler::fast_lock() and ObjectSynchronizer::quick_enter() to succeed. quick_enter() is only called from SharedRuntime::complete_monitor_locking_C() and complete_monitor_locking_C() is only supposed to be called when MacroAssembler::fast_lock() fails. Here's the hs_err_pid stack trace from a failure of the new guarantee(): XXX Here's the dbx stack trace from a failure of the new guarantee(): (dbx) where current thread: t@89 dbx: core file read error: address 0xdf48000000000008 not in data space dbx: attempt to read frame failed -- cannot get return address [1] __lwp_kill(0x59, 0x6, 0xfffffeb4040b70c0, 0xfffffd7fff293e0e, 0xfffffd7fc07de2f0, 0x6), at 0xfffffd7fff29351a [2] _thr_kill(), at 0xfffffd7fff28be13 [3] raise(), at 0xfffffd7fff2381b9 [4] abort(), at 0xfffffd7fff216b80 =>[5] os::abort(dump_core = true, siginfo = , context = ) (optimized), at 0xfffffd7ffe92b676 (line ~1396) in "os_solaris.cpp" [6] VMError::report_and_die(id = , message = , detail_fmt = , detail_args = , thread = , pc = , siginfo = (nil), context = (nil), filename = 0xfffffd7ffef15aa0 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/thread.cpp", lineno = 4894, size = 0) (optimized), at 0xfffffd7ffebde3e1 (line ~1152) in "vmError.cpp" [7] VMError::report_and_die(thread = , filename = , lineno = , message = , detail_fmt = , detail_args = ) (optimized), at 0xfffffd7ffebdd4af (line ~931) in "vmError.cpp" [8] report_vm_error(file = 0xfffffd7ffef15aa0 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/thread.cpp", line = 4894, error_msg = 0xfffffd7ffef15a30 "guarantee(trace_points != 0x000400000000L || (_dcubed_jme_last_trace_points & 0x00002000L) == 0) failed", detail_fmt = 0xfffffd7ffef159f0 "fast_lock() and quick_enter() cannot both succeed!", ...) (optimized), at 0xfffffd7ffe2cd948 (line ~218) in "debug.cpp" [9] JavaThread::add_dcubed_jme_last_trace_points(this = , obj = , lock = , mon = , trace_points = ) (optimized), at 0xfffffd7ffeb06fb1 (line ~4894) in "thread.cpp" [10] ObjectSynchronizer::quick_enter(obj = 0xfffffd7be71fdf48, Self = 0x1e88800, Lock = 0xfffffd7fc07de860) (optimized), at 0xfffffd7ffeaaec71 (line ~268) in "synchronizer.cpp" [11] SharedRuntime::complete_monitor_locking_C(_obj = 0xfffffd7be71fdf48, lock = 0xfffffd7fc07de860, thread = 0x1e88800) (optimized), at 0xfffffd7ffea1c1c8 (line ~1888) in "sharedRuntime.cpp" [12] 0xfffffd7feab35408(), at 0xfffffd7feab35408 [13] 0xfffffd7feab35408(), at 0xfffffd7feab35408 [14] 0xfffffd7ff252337c(), at 0xfffffd7ff252337c [15] 0x8(), at 0x8 Not sure why frame 12 and 13 are the same address info. So let's take a look at the code from frame 12/13 that got us to SharedRuntime::complete_monitor_locking_C(): (dbx) x 0xfffffd7ff2523377,0xfffffd7ff252337c/i 0xfffffd7ff2523377: call 0xfffffd7feab353e0 [ 0xfffffd7feab353e0, .-0x79edf97 ] 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] So frame 14 called 0xfffffd7feab353e0 which is really close to our frame 12/13 address: (dbx) x 0xfffffd7feab353e0,0xfffffd7feab35408/i 0xfffffd7feab353e0: subq $0x0000000000000008,%rsp 0xfffffd7feab353e7: movq %rbp,(%rsp) 0xfffffd7feab353eb: movq %rsp,0x00000000000001d0(%r15) 0xfffffd7feab353f2: movq %rsi,%rdi 0xfffffd7feab353f5: movq %rdx,%rsi 0xfffffd7feab353f8: movq %r15,%rdx 0xfffffd7feab353fb: movq $complete_monitor_locking_C,%r10 0xfffffd7feab35405: call *%r10d 0xfffffd7feab35408: movq $0x0000000000000000,0x00000000000001d0(%r15) so the code from frame 12/13 is pretty much marshalling code for calling complete_monitor_locking_C which has this signature: // Handles the uncommon case in locking, i.e., contention or an inflated lock. JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) subq $0x0000000000000008,%rsp // make space on the stack movq %rbp,(%rsp) // save %rbp on the stack movq %rsp,0x00000000000001d0(%r15) // save %rsp in a field in %r15 (thread) movq %rsi,%rdi // guessing this is _obj param movq %rdx,%rsi // guessing this is lock param movq %r15,%rdx // this is thread param movq $complete_monitor_locking_C,%r10 call *%r10d // call complete_monitor_locking_C // zero the field in %r15 (thread) movq $0x0000000000000000,0x00000000000001d0(%r15) So here's the regs from frame 12/13: (dbx) regs current thread: t@89 current frame: [13] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0xfffffd7be71fdf48 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7feab35408:0xfffffd7feab35408 movq $0x0000000000000000,0x00000000000001d0(%r15) cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0xfffffd7be71fdf48/X 0xfffffd7be71fdf48: 0x02c6ab82 and here's the regs from frame 14: (dbx) regs current thread: t@89 current frame: [14] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0x0000000002c6ab82 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7ff252337c:0xfffffd7ff252337c jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0x0000000002c6ab82/X dbx: warning: unknown language, 'c' assumed 0x0000000002c6ab82: 0x00000000 The *rbp value of NULL explains why the dbx stack trace stops at frame 14. So without a valid frame 15, it's hard to know where we go into the code in frame 14. For now, I'm dumping this big section: (dbx) x 0xfffffd7ff2523200,0xfffffd7ff252337c/i 0xfffffd7ff2523200: popq %rbp 0xfffffd7ff2523201: .byte 0xff [unknown opcode] 0xfffffd7ff2523202: .byte 0xff [unknown opcode] 0xfffffd7ff2523203: decl 0x0000000000000054(%rbx,%rcx,4) 0xfffffd7ff2523207: andb $0x0000000000000018,%al 0xfffffd7ff2523209: movq %r10,(%rsp) 0xfffffd7ff252320d: movq 0x0000000000000030(%rsp),%r10 0xfffffd7ff2523212: movq %r10,0x0000000000000018(%rsp) 0xfffffd7ff2523217: movl %ebx,0x0000000000000010(%rsp) 0xfffffd7ff252321b: movl %r13d,0x0000000000000024(%rsp) 0xfffffd7ff2523220: movl %r8d,0x0000000000000040(%rsp) 0xfffffd7ff2523225: movl %r11d,0x0000000000000044(%rsp) 0xfffffd7ff252322a: nop 0xfffffd7ff252322b: call 0xfffffd7fea847b60 [ 0xfffffd7fea847b60, .-0x7cdb6cb ] 0xfffffd7ff2523230: pushq %rax 0xfffffd7ff2523231: pushq %rdx 0xfffffd7ff2523232: pushq %rcx 0xfffffd7ff2523233: call breakpoint [ 0xfffffd7ffe929af0, .+0xc4068bd ] 0xfffffd7ff2523238: popq %rcx 0xfffffd7ff2523239: popq %rdx 0xfffffd7ff252323a: popq %rax 0xfffffd7ff252323b: movq %r15,%rsi 0xfffffd7ff252323e: movq $g1_wb_pre,%r10 0xfffffd7ff2523248: call *%r10d 0xfffffd7ff252324b: jmp 0xfffffd7ff25225dc [ 0xfffffd7ff25225dc, .-0xc6f ] 0xfffffd7ff2523250: lock cmpxchgq %r10,0x0000000000000000(%rbp) 0xfffffd7ff2523256: leaq 0x0000000000000050(%rsp),%rbx 0xfffffd7ff252325b: pushq %rdx 0xfffffd7ff252325c: xorq %rdx,%rdx 0xfffffd7ff252325f: orq $0x0000000000000002,%rdx 0xfffffd7ff2523263: orq $0x0000000000000020,%rdx 0xfffffd7ff2523267: xorq %r10,%r10 0xfffffd7ff252326a: orq $0x0000000000000040,%rdx 0xfffffd7ff252326e: xorq %r10,%r10 0xfffffd7ff2523271: movq 0x0000000000000000(%rbp),%rax 0xfffffd7ff2523275: testq $0x0000000000000002,%rax 0xfffffd7ff252327b: jne 0xfffffd7ff25232c9 [ 0xfffffd7ff25232c9, .+0x4e ] 0xfffffd7ff252327d: orq $0x0000000000000080,%rdx 0xfffffd7ff2523284: orq $0x0000000000000001,%rax 0xfffffd7ff2523288: movq %rax,(%rbx) 0xfffffd7ff252328b: lock cmpxchgq %rbx,0x0000000000000000(%rbp) 0xfffffd7ff2523291: je 0xfffffd7ff25232e3 [ 0xfffffd7ff25232e3, .+0x52 ] 0xfffffd7ff2523297: orq $0x0000000000000100,%rdx 0xfffffd7ff252329e: subq %rsp,%rax 0xfffffd7ff25232a1: andq $0xfffffffffffff007,%rax 0xfffffd7ff25232a8: movq %rax,(%rbx) 0xfffffd7ff25232ab: je 0xfffffd7ff25232ba [ 0xfffffd7ff25232ba, .+0xf ] 0xfffffd7ff25232ad: orq $0x0000000000000200,%rdx 0xfffffd7ff25232b4: cmpq $0x0000000000000000,%rsp 0xfffffd7ff25232b8: jmp 0xfffffd7ff25232c4 [ 0xfffffd7ff25232c4, .+0xc ] 0xfffffd7ff25232ba: orq $0x0000000000000400,%rdx 0xfffffd7ff25232c1: xorq %rbx,%rbx 0xfffffd7ff25232c4: jmp 0xfffffd7ff25232e3 [ 0xfffffd7ff25232e3, .+0x1f ] 0xfffffd7ff25232c9: orq $0x0000000000000800,%rdx 0xfffffd7ff25232d0: movq %rax,%r10 0xfffffd7ff25232d3: xorq %rax,%rax 0xfffffd7ff25232d6: lock cmpxchgq %r15,0x000000000000007e(%r10) 0xfffffd7ff25232dc: movq $0x0000000000000003,(%rbx) 0xfffffd7ff25232e3: je 0xfffffd7ff2523329 [ 0xfffffd7ff2523329, .+0x46 ] 0xfffffd7ff25232e5: orq $0x0000000000001000,%rdx 0xfffffd7ff25232ec: pushq %rbp 0xfffffd7ff25232ed: pushq %r10 0xfffffd7ff25232ef: pushq %rdx 0xfffffd7ff25232f0: movq %rdx,%rcx 0xfffffd7ff25232f3: movq %r10,%rdx 0xfffffd7ff25232f6: movq %rbp,%rsi 0xfffffd7ff25232f9: movq %r15,%rdi 0xfffffd7ff25232fc: testl $0x000000000000000f,%esp 0xfffffd7ff2523302: je 0xfffffd7ff252331a [ 0xfffffd7ff252331a, .+0x18 ] 0xfffffd7ff2523308: subq $0x0000000000000008,%rsp 0xfffffd7ff252330c: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9384 ] 0xfffffd7ff2523311: addq $0x0000000000000008,%rsp 0xfffffd7ff2523315: jmp 0xfffffd7ff252331f [ 0xfffffd7ff252331f, .+0xa ] 0xfffffd7ff252331a: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9376 ] 0xfffffd7ff252331f: popq %rdx 0xfffffd7ff2523320: popq %r10 0xfffffd7ff2523322: popq %rbp 0xfffffd7ff2523323: cmpq $0x0000000000000000,%rsp 0xfffffd7ff2523327: jmp 0xfffffd7ff252336a [ 0xfffffd7ff252336a, .+0x43 ] 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx 0xfffffd7ff2523330: pushq %rbp 0xfffffd7ff2523331: pushq %r10 0xfffffd7ff2523333: pushq %rdx 0xfffffd7ff2523334: movq %rdx,%rcx 0xfffffd7ff2523337: movq %r10,%rdx 0xfffffd7ff252333a: movq %rbp,%rsi 0xfffffd7ff252333d: movq %r15,%rdi 0xfffffd7ff2523340: testl $0x000000000000000f,%esp 0xfffffd7ff2523346: je 0xfffffd7ff252335e [ 0xfffffd7ff252335e, .+0x18 ] 0xfffffd7ff252334c: subq $0x0000000000000008,%rsp 0xfffffd7ff2523350: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9340 ] 0xfffffd7ff2523355: addq $0x0000000000000008,%rsp 0xfffffd7ff2523359: jmp 0xfffffd7ff2523363 [ 0xfffffd7ff2523363, .+0xa ] 0xfffffd7ff252335e: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9332 ] 0xfffffd7ff2523363: popq %rdx 0xfffffd7ff2523364: popq %r10 0xfffffd7ff2523366: popq %rbp 0xfffffd7ff2523367: xorq %rbx,%rbx 0xfffffd7ff252336a: popq %rdx 0xfffffd7ff252336b: je 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb28 ] 0xfffffd7ff2523371: leaq 0x0000000000000050(%rsp),%rdx 0xfffffd7ff2523376: nop 0xfffffd7ff2523377: call 0xfffffd7feab353e0 [ 0xfffffd7feab353e0, .-0x79edf97 ] 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] First thing I've noticed is the trace_fast_lock calls which happens to be some of my tracing code for this bug. I don't quite understand why there are four calls to it, but we'll get there... Let's consider this part of src/cpu/x86/vm/macroAssembler_x86.cpp: fast_lock(): (I've elided some of the code that's not included in the current config, e.g. no DCUBED_OME_DEBUG and no RTM.) #else // _LP64 #ifdef DCUBED_JME_TRACE // Record that we're in the inflated block orptr(tracePoints, 0x00000800); #endif // It's inflated movq(scrReg, tmpReg); xorq(tmpReg, tmpReg); if (os::is_MP()) { lock(); } cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); // Unconditionally set box->_displaced_header = markOopDesc::unused_mark(). // Without cast to int32_t movptr will destroy r10 which is typically obj. movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); // Intentional fall-through into DONE_LABEL ... // Propagate ICC.ZF from CAS above into DONE_LABEL. #endif // _LP64 // DONE_LABEL is a hot target - we'd really like to place it at the // start of cache line by padding with NOPs. // See the AMD and Intel software optimization manuals for the // most efficient "long" NOP encodings. // Unfortunately none of our alignment mechanisms suffice. bind(DONE_LABEL); // At DONE_LABEL the icc ZFlag is set as follows ... // Fast_Unlock uses the same protocol. // ZFlag == 1 -> Success // ZFlag == 0 -> Failure - force control through the slow-path } #ifdef DCUBED_JME_TRACE Label MY_DONE0, MY_DONE1; // if current state is success, then preserve that jccb(Assembler::zero, MY_DONE0); // Record that we returned failure from fast_lock orptr(tracePoints, 0x00001000); // save the current trace point info for objReg trace_fast_lock(objReg, scrReg, tracePoints); cmpptr(rsp, 0); // set ICC.ZF=0 to indicate failure jmpb(MY_DONE1); bind(MY_DONE0); // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); // save the current trace point info for objReg // Note: This trace_fast_lock() causes a crash with slowdebug bits // near the end of the test run in deoptimization code. trace_fast_lock(objReg, scrReg, tracePoints); xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success bind(MY_DONE1); pop(tracePoints); #endif } The above macroAssembler_x86.cpp: fast_lock() code maps to this code in memory: (trimming off the addresses in brackets and going wide here for annotations after the instructions) 0xfffffd7ff25232c9: orq $0x0000000000000800,%rdx // orptr(tracePoints, 0x00000800); 0xfffffd7ff25232d0: movq %rax,%r10 // movq(scrReg, tmpReg); 0xfffffd7ff25232d3: xorq %rax,%rax // xorq(tmpReg, tmpReg); // if (os::is_MP()) { // lock(); // } 0xfffffd7ff25232d6: lock cmpxchgq %r15,0x000000000000007e(%r10) // cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); 0xfffffd7ff25232dc: movq $0x0000000000000003,(%rbx) // movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); // if the cmpxchgptr worked we branch to MY_DONE0, otherwise... 0xfffffd7ff25232e3: je 0xfffffd7ff2523329 // jccb(Assembler::zero, MY_DONE0); // Record that we returned failure from fast_lock 0xfffffd7ff25232e5: orq $0x0000000000001000,%rdx // orptr(tracePoints, 0x00001000); // begin MacroAssembler::trace_fast_lock(): 0xfffffd7ff25232ec: pushq %rbp // push(objReg); // save/restore across call_VM 0xfffffd7ff25232ed: pushq %r10 // push(omReg); 0xfffffd7ff25232ef: pushq %rdx // push(tracePoints); 0xfffffd7ff25232f0: movq %rdx,%rcx // pass_arg3(this, tracePoints); 0xfffffd7ff25232f3: movq %r10,%rdx // pass_arg2(this, omReg); 0xfffffd7ff25232f6: movq %rbp,%rsi // pass_arg1(this, objReg); 0xfffffd7ff25232f9: movq %r15,%rdi // pass_arg0(this, r15_thread); // begin MacroAssembler::call_VM_leaf_base() 0xfffffd7ff25232fc: testl $0x000000000000000f,%esp 0xfffffd7ff2523302: je 0xfffffd7ff252331a 0xfffffd7ff2523308: subq $0x0000000000000008,%rsp // make stack space for the call 0xfffffd7ff252330c: call trace_fast_lock // make the call 0xfffffd7ff2523311: addq $0x0000000000000008,%rsp // take back the stack space 0xfffffd7ff2523315: jmp 0xfffffd7ff252331f 0xfffffd7ff252331a: call trace_fast_lock // make the call without extra stack space // end MacroAssembler::call_VM_leaf_base() 0xfffffd7ff252331f: popq %rdx // pop(tracePoints); 0xfffffd7ff2523320: popq %r10 // pop(omReg); 0xfffffd7ff2523322: popq %rbp // pop(objReg); // end MacroAssembler::trace_fast_lock() 0xfffffd7ff2523323: cmpq $0x0000000000000000,%rsp // cmpptr(rsp, 0); // set ICC.ZF=0 to indicate failure 0xfffffd7ff2523327: jmp 0xfffffd7ff252336a // jmpb(MY_DONE1); // Record that we returned success from fast_lock 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); // begin MacroAssembler::trace_fast_lock(): 0xfffffd7ff2523330: pushq %rbp // push(objReg); 0xfffffd7ff2523331: pushq %r10 // push(omReg); 0xfffffd7ff2523333: pushq %rdx // push(tracePoints); 0xfffffd7ff2523334: movq %rdx,%rcx // pass_arg3(this, tracePoints); 0xfffffd7ff2523337: movq %r10,%rdx // pass_arg2(this, omReg); 0xfffffd7ff252333a: movq %rbp,%rsi // pass_arg1(this, objReg); 0xfffffd7ff252333d: movq %r15,%rdi // pass_arg0(this, r15_thread); // begin MacroAssembler::call_VM_leaf_base() 0xfffffd7ff2523340: testl $0x000000000000000f,%esp 0xfffffd7ff2523346: je 0xfffffd7ff252335e 0xfffffd7ff252334c: subq $0x0000000000000008,%rsp // make stack space for the call 0xfffffd7ff2523350: call trace_fast_lock // make the call 0xfffffd7ff2523355: addq $0x0000000000000008,%rsp // take back the stack space 0xfffffd7ff2523359: jmp 0xfffffd7ff2523363 0xfffffd7ff252335e: call trace_fast_lock // make the call without extra stack space // end MacroAssembler::call_VM_leaf_base() 0xfffffd7ff2523363: popq %rdx // pop(tracePoints); 0xfffffd7ff2523364: popq %r10 // pop(omReg); 0xfffffd7ff2523366: popq %rbp // pop(objReg); // end MacroAssembler::trace_fast_lock() 0xfffffd7ff2523367: xorq %rbx,%rbx // xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success 0xfffffd7ff252336a: popq %rdx // pop(tracePoints); // end MacroAssembler::fast_lock() // Now we're in C2 code that checks the results of the // fast_lock() call and calls complete_monitor_locking_C() // if ICC.ZF=0 (failure) 0xfffffd7ff252336b: je 0xfffffd7ff2522843 // if ICC.ZF=1 we are done 0xfffffd7ff2523371: leaq 0x0000000000000050(%rsp),%rdx 0xfffffd7ff2523376: nop 0xfffffd7ff2523377: call 0xfffffd7feab353e0 // calls the code in frame 12/13 that calls complete_monitor_locking_C() 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 // we are done Rewinding back to the typical trace bits that we see for this failure: dcubed_jme_last_trace_points=0x0000000500002862 There are two bits of particular interest: 0x000000002000 0x000400000000 0x000000002000 marks that fast_lock()'s cmpxchgq worked. This line from memory: 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); 0x000400000000 marks that quick_enter()'s Atomic::cmpxchg_ptr() worked. That's the code called by this line from memory: 0xfffffd7ff2523377: call 0xfffffd7feab353e0 // calls the code in frame 12/13 that calls complete_monitor_locking_C() The setting of 0x000000002000 and the call to complete_monitor_locking_C() are on different code paths and I don't see anything in the "success" code path that could accidentally lead to the "failure" code path. Here's the "success" code path: 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); 0xfffffd7ff2523367: xorq %rbx,%rbx // xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success 0xfffffd7ff252336a: popq %rdx // pop(tracePoints); 0xfffffd7ff252336b: je 0xfffffd7ff2522843 // if ICC.ZF=1 we are done After the 'orq' sets 0x0000000000002000, we have code to call trace_fast_lock which could change ICC.ZF, but "xorq %rbx,%rbx" zeros the rbx register and resets the ICC.ZF=1 state. The "popq %rdx" is housekeeping that does not change the ICC.ZF value so that leads us to "je 0xfffffd7ff2522843" which is the branch around the call to complete_monitor_locking_C() because we are done. So it looks to me like this code path doesn't have any holes that can explain how both 0x000000002000 and 0x000400000000 are set in our tracing flags. Time to see if there's another way for an errant ObjectSynchronizer::quick_enter() call to be made.