-
Bug
-
Resolution: Fixed
-
P4
-
15
-
b25
-
generic
-
generic
src/hotspot/share/runtime/synchronizer.cpp contains the
following comment today:
// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing
// processor -- is a better choice as it scales on SMP systems.
Kim B. happened to notice this comment recently and thinks it
might be stale or wrong.
I did spelunking about the history behind this comment:
Before the fix forJDK-8049717 was pushed, this comment was:
394 // Performance concern:
395 // OrderAccess::storestore() calls release() which STs 0 into the global volatile
396 // OrderAccess::Dummy variable. This store is unnecessary for correctness.
397 // Many threads STing into a common location causes considerable cache migration
398 // or "sloshing" on large SMP system. As such, I avoid using OrderAccess::storestore()
399 // until it's repaired. In some cases OrderAccess::fence() -- which incurs local
400 // latency on the executing processor -- is a better choice as it scales on SMP
401 // systems. See http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a
402 // discussion of coherency costs. Note that all our current reference platforms
403 // provide strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.
During the code review cycle forJDK-8049717, we considered
deleting part of the comment and we also considered various
rewrites of the comment.
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-July/012138.html
We finally settled on the following rewrite from David H:
I think the performance comment has lost a bit of context. May I suggest just a slight change to the original that puts it into historical perspective:
// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::Dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing processor
// -- is a better choice as it scales on SMP systems. See
// http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a discussion
// of coherency costs. Note that all our current reference platforms provide
// strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.
Though I also note that there is no a single OrderAccess call in synchronizer.cpp
I didn't find any further tweaks to the wording as part of the
JDK-8049717 code review cycle, but it looks like other minor
edits have occurred since then (done by other bug fixes).
following comment today:
// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing
// processor -- is a better choice as it scales on SMP systems.
Kim B. happened to notice this comment recently and thinks it
might be stale or wrong.
I did spelunking about the history behind this comment:
Before the fix for
394 // Performance concern:
395 // OrderAccess::storestore() calls release() which STs 0 into the global volatile
396 // OrderAccess::Dummy variable. This store is unnecessary for correctness.
397 // Many threads STing into a common location causes considerable cache migration
398 // or "sloshing" on large SMP system. As such, I avoid using OrderAccess::storestore()
399 // until it's repaired. In some cases OrderAccess::fence() -- which incurs local
400 // latency on the executing processor -- is a better choice as it scales on SMP
401 // systems. See http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a
402 // discussion of coherency costs. Note that all our current reference platforms
403 // provide strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.
During the code review cycle for
deleting part of the comment and we also considered various
rewrites of the comment.
https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-July/012138.html
We finally settled on the following rewrite from David H:
I think the performance comment has lost a bit of context. May I suggest just a slight change to the original that puts it into historical perspective:
// Performance concern:
// OrderAccess::storestore() calls release() which at one time stored 0
// into the global volatile OrderAccess::Dummy variable. This store was
// unnecessary for correctness. Many threads storing into a common location
// causes considerable cache migration or "sloshing" on large SMP systems.
// As such, I avoided using OrderAccess::storestore(). In some cases
// OrderAccess::fence() -- which incurs local latency on the executing processor
// -- is a better choice as it scales on SMP systems. See
// http://blogs.sun.com/dave/entry/biased_locking_in_hotspot for a discussion
// of coherency costs. Note that all our current reference platforms provide
// strong ST-ST order, so the issue is moot on IA32, x64, and SPARC.
Though I also note that there is no a single OrderAccess call in synchronizer.cpp
I didn't find any further tweaks to the wording as part of the
edits have occurred since then (done by other bug fixes).
- relates to
-
JDK-8049717 expose L1_data_cache_line_size for diagnostic/sanity checks
- Closed