Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8369506

Bytecode rewriting causes Java heap corruption on AArch64

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: P2 P2
    • None
    • 21, 22, 23, 24, 25
    • hotspot
    • aarch64

      `RewriteBytecodes` is on by default and on AArch64 we have confirmed that it causes Java heap corruption. The root cause is missing barriers on both the path which updates the bytecode and the fast bytecode handlers. `ResolvedFieldEntry::fill_in` uses`STLR` to resolve the `ResolvedFieldEntry` and the interpreter uses `STR` to update the bytecode shortly after. Unfortunately AArch64 allows `STR` after `STLR` to be reordered before, allowing other threads to observe the patched bytecode before `ResolvedFieldEntry` is resolved. Since the fast bytecode handlers expect `ResolvedFieldEntry` to be resolved, this leads to corruption.

      This was confirmed by inserting logic near the beginning of the fast bytecode handlers `TemplateTable::fast_*` which checks whether the field offset from `ResolvedFieldEntry` is 0 and calls stop if it is. We observed in rare circumstances the stop was triggered. This leads to the markWord being clobbered.

      This manifests itself as impossible branches being taken (`return foo == null || foo.isEmpty()` throwing NullPointerException for `foo` or `if (foo != null)` not being taken when `foo` is guaranteed to have not been null), interpreter crashes related to corrupted receiver oops on the stack, or GC crashes related to corrupted oops. Typically if the JVM is going to crash (it does not always), it does so within the first minute or two.

      We do not have a reproducing test case to share as it is extremely rare, but it is observable at scale. We have an internal test which reproduces it, but it is large, proprietary, and impossible to isolate.

      The only workaround is disabling bytecode rewritting by passing `-XX:-RewriteBytecodes`.

      NOTE: This affects OpenJDK going all the way back through at least 21.

      We performed benchmarks with bytecode rewritting on and off, and the results were negligible. This begs the question, on modern hardware with the existance of C1 and C2, is bytecode rewritting necessary?

      There are 3 options to resolve this:

      1. Disable bytecode rewritting on AArch64 and likely other weak memory model architectures.
      2. Ditch bytecode rewritting all together in Hotspot.
      3. Add the necessary barriers. I have a patch which adds the minimum barriers necessary to make it safe according to AArch64 specifications.

            jcking Justin King
            jcking Justin King
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: