Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8352066

JVM.commit() and JVM.flush() exhibit race conditions against JFR epochs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P2 P2
    • 25
    • 25
    • hotspot
    • None
    • jfr
    • master

      Old Event::commit before JDK-8303134


      // EventWriter::endEvent()
         ...
         unsafe.putAddress(startPositionAddress, currentPosition);


      The above unsafe call turned problematic when unsafe intrinsic were not JIT inlined.
      This is because it becomes a regular invocation of a native method, which includes the following thread transitions: _thread_in_Java -> _thread_in_native -> [safepoint check] -> _thread_in_vm
      Since there is a safepoint check before the thread enters _thread_in_vm, a JFR epoch rotation can interleave, invalidating the commit once the thread enters _thread_in_vm (because it is now stale - in the wrong epoch).

      JDK-8303134 tried to solve this problem by ensuring that a thread never entered a safepoint during commit, but it did so mainly by letting the thread stay _thread_in_native.

      There is a fundamental oversight in the solution provided as part of JDK-8303134: a thread in state _thread_in_native is excluded from the safepoint protocol, meaning JFR can issue a safepoint to evolve the JFR epoch concurrently while the thread is in _thread_in_native.

      There are two consequences of this; both involve race conditions against JFR epochs:

      1. JVM.commit() - the thread is not guaranteed to discover its notified property as set by the Jfr Recorder Thread during the safepoint. The property tells the thread to abort and retry its current commit (effectively roll-forwarding of the commit into the new JFR epoch). Without detecting the notified property, the thread can write an event into the wrong epoch.

      2. JVM.flush() - a thread can issue a flush of its thread-local buffer while in _thread_in_native. Since all buffers are written and cleared by the JFR Recorder Thread during a safepoint, a thread can insert events from a previous epoch into the newly cleared buffers. Consequently, events of an earlier epoch are now included in the set of events for the new JFR epoch, even though all buffers were presumably cleared (and the underlying problem with this is that they refer to constant pools only serialized in the previous epoch).

      From a user perspective, the results of these races are relatively benign, albeit a bit annoying. The visible effect is that relational keys of events that point to non-existent constant pools cannot be resolved, rendering artifacts as <null>.

      However, every chunk should be self-contained and fully resolved for a consistent model.

      If I remember correctly, the thinking with JDK-8303134 was a concern about the performance overhead in doing a thread transition to _thread_in_vm for every event commit (in addition to forgetting that _thread_in_native excludes a thread from the safepoint protocol).

      Those concerns are unfounded because JDK-8303134 includes a C2 intrinsic, LibraryCallKit::inline_native_jvm_commit, which inlines all commit logic, to be executed in state _thread_in_Java.

      We should, therefore, return the slow path entry points to perform the required _thread_in_vm transitions to keep threads inside the safepoint protocol. Then the thread will correctly see and check its notified property and abort, so we will still solve the original problem with non-inlined unsafe.putAddress(startPositionAddress, currentPosition);

      There is also a racy read of entries added to the stack trace table. We should only determine if new entries exist after taking the stack trace lock to ensure no entries are missed.

            mgronlun Markus Grönlund
            mgronlun Markus Grönlund
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: