Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8261649

AArch64: Optimize LSE atomics in C++ code

XMLWordPrintable

    • b11
    • aarch64

        Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.

        Barrier-ordered-before, Arm Architecture Reference Manual B2.3 :

           | Barrier instructions order prior Memory effects before subsequent
           | Memory effects generated by the same Observer. A read or a write RW1
           | is Barrier-ordered-before a read or a write RW2 from the same Observer
           | if and only if RW1 appears in program order before RW2 and any of the
           | following cases apply:
           |
           | [...]
           |
           | * RW1 appears in program order before an atomic instruction with both
           | Acquire and Release semantics that appears in program order before RW2.

        So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction.

              aph Andrew Haley
              aph Andrew Haley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: