Found this when studying Leyden performance.
JDK-8049304 added 1ms sleep on destruction path to catch up with threads updating the counters:
https://github.com/openjdk/jdk/blob/c00557f8f53ff729c8a1857a4ffcc585d3f8c6c4/src/hotspot/share/runtime/perfData.cpp#L268
This delay eats the execution time on very short runs. Look:
$ hyperfine -w 10 -r 100 ...
# Baseline
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 19.9 ms ± 0.3 ms [User: 11.6 ms, System: 15.7 ms]
Range (min … max): 19.4 ms … 20.7 ms 100 runs
# Disable UsePerfData altogether
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UsePerfData -Xmx128m Hello
Time (mean ± σ): 18.3 ms ± 0.3 ms [User: 11.4 ms, System: 15.7 ms]
Range (min … max): 17.8 ms … 19.2 ms 100 runs
# Remove sleep(1ms)
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 18.8 ms ± 0.3 ms [User: 11.9 ms, System: 15.4 ms]
Range (min … max): 18.4 ms … 19.6 ms 100 runs
The sleep in question looks opportunistic and not load-bearing for correctness (it cannot be, right?). If we still believe we need to coordinate the counter updates and deletions, we can use the GlobalCounter for syncs, see attachedJDK-8348402-poc.patch. It performs reasonably well in the tests:
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 18.9 ms ± 0.2 ms [User: 11.7 ms, System: 15.6 ms]
Range (min … max): 18.4 ms … 19.3 ms 100 runs
https://github.com/openjdk/jdk/blob/c00557f8f53ff729c8a1857a4ffcc585d3f8c6c4/src/hotspot/share/runtime/perfData.cpp#L268
This delay eats the execution time on very short runs. Look:
$ hyperfine -w 10 -r 100 ...
# Baseline
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 19.9 ms ± 0.3 ms [User: 11.6 ms, System: 15.7 ms]
Range (min … max): 19.4 ms … 20.7 ms 100 runs
# Disable UsePerfData altogether
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UsePerfData -Xmx128m Hello
Time (mean ± σ): 18.3 ms ± 0.3 ms [User: 11.4 ms, System: 15.7 ms]
Range (min … max): 17.8 ms … 19.2 ms 100 runs
# Remove sleep(1ms)
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 18.8 ms ± 0.3 ms [User: 11.9 ms, System: 15.4 ms]
Range (min … max): 18.4 ms … 19.6 ms 100 runs
The sleep in question looks opportunistic and not load-bearing for correctness (it cannot be, right?). If we still believe we need to coordinate the counter updates and deletions, we can use the GlobalCounter for syncs, see attached
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m Hello
Time (mean ± σ): 18.9 ms ± 0.2 ms [User: 11.7 ms, System: 15.6 ms]
Range (min … max): 18.4 ms … 19.3 ms 100 runs
- relates to
-
JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
-
- Closed
-
-
JDK-8348829 Remove ObjectMonitor perf counters
-
- Open
-
-
JDK-8246020 -XX:+UsePerfData is enabled by default and slows down VM bootstrap by 6%
-
- Closed
-
- links to
-
Commit(master) openjdk/jdk/305bbdae
-
Review(master) openjdk/jdk/23293