Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: 20
Component/s: hotspot
Labels:

Subcomponent:
runtime
Resolved In Build:
b22

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8323050	17.0.11	Aleksey Shipilev	P4	Resolved	Fixed	b01

Found this while micro benchmarking Generational ZGC.

See how the following code uses __ATOMIC_RELEASE and FULL_MEM_BARRIER even when order == memory_order_relaxed:

template<size_t byte_size>
struct Atomic::PlatformAdd {
  template<typename D, typename I>
  D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const {
    D res = __atomic_add_fetch(dest, add_value, __ATOMIC_RELEASE);
    FULL_MEM_BARRIER;
    return res;
  }

This causes a noticeable slowdown in parts of the Generational ZGC code that walks over large C++ arrays, and claim indices/chunks with fetch_and_add.

If I change the code to be:
if (order == memory_order_relaxed) {
  return __atomic_add_fetch(dest, add_value, __ATOMIC_RELAXED);
} else {
  ... old code ...
}

then I see a significant enough speed up.

backported by

JDK-8323050 Add relaxed add_and_fetch for macos aarch64 atomics

Resolved

duplicates

JDK-8307511 Add relaxed add_and_fetch for macos aarch64 atomics

Closed

relates to

JDK-8293716 Atomic bindings for fetch_and_add and xchg sometimes have too weak memory ordering on Mac AArch64

Closed

links to

Commit openjdk/jdk17u-dev/6340b666

Commit openjdk/jdk/7a1cb64b

Review openjdk/jdk17u-dev/2097

Review openjdk/jdk/13823

(2 links to)

Assignee:: Stefan Karlsson

Reporter:: Stefan Karlsson

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022-09-08 09:09

Updated:: 2024-01-05 02:00

Resolved:: 2023-05-08 00:57

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates