In JDK 9 we looked at replacing library-based thread local storage (TLS) with use of C++ thread_local (JDK-8132510) but there some issues/concerns around the use of that and so we opted to use the compiler specific TLS mechanisms provided by gcc/clang/VS.
A significant limitation to the gcc TLS extension is that if an initializer is present for a thread-local variable, it must be a constant-expression. [1] That means that we can't declare a thread-local variable that is a class instance with non-trivial construction and destruction.
Project Panama has a usecase for TLS that requires a non-trivial destructor for a C++ class, such that threads that attach to the JVM to process Java "upcalls" will be automatically detached when the thread terminates (if it didn't detach explicitly).
A discussion on the pros and cons of using C++ thread_local as the mechanism for TLS in the JVM, shows there are still a number of concerns that argue against its wholescale adoption. Some relevant extracts from that discussion:
"[A] reminder that the difference between C++11 thread_local and the gcc's __thread came up in the discussion ofJDK-8230877.
https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-September/039487.html
That's what led us to the current restriction against using thread_local**. We could revisit that. thread_local usually requires an extra prologue before an access to ensure the variable has been initialized, while __thread requires the initializer be a constant expression. AlsoJDK-8230877 was before C++11/14 support and use was in place."
---
** The issue here is a potential performance hit. As the gcc documentation describes it [2]:
"Unfortunately, this [C++ thread_local] support requires a run-time penalty for references to non-function-local thread_local variables defined in a different translation unit even if they don't need dynamic initialization, so users may want to continue to use __thread for TLS variables with static initialization semantics."
Some preliminary benchmarking with gcc __thread converted to C++ thread_local did show some significant regressions on a couple of benchmarks on Aarch64.
---
"thread_local has all the same initialization order issues as globals. There's a nicely worked out analysis here:
https://stackoverflow.com/questions/60813372/initialization-order-of-thread-local-vs-global-variables
So I think I'd like us to stick with the limited version that requires a constexpr initialization expression, at least for the most part."
---
"We could relax the prohibition to allow thread_local where really required. I might want a noisy looking macro for that use-case, with bare thread_local remaining forbidden. That makes it clear that someone thought about the question at least a little bit.
I looked for a way to warn about uses of thread_local that could be locally disabled where we intentionally use it, but didn't find such a thing. Clang (some version) has -Wglobal-constructors, and a patch exists for adding it to gcc, but it's not in gcc11.2 (the latest release).
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01860.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71482
But I did stumble over this. Might this be a problem for you?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991"
---
For the record gcc bug 61991 is not an issue for the proposed use-case because the TLS variable does get used.
---
"There are two possible maintenance issues: 1. If we don't document our decisions about our choices we won't be able to re-evaluate them later on without full re-analysis, so let's put the info into the JBS entry. 2. Even if the choices are fully documented, there's some cost and risk in applying the documented reasoning correctly in each case, compared to a "one size fits all" design. But it seems like we have a plan to deal with those possible maintenance issues."
---
So the proposal here is to allow "well considered" uses of C++ thread_local, by providing a suitably "noisy" macro, and adjusting the Hotspot Style Guide [3] section on allowed C++ features to accommodate this.
[1] https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
[2] https://gcc.gnu.org/gcc-4.8/changes.html
[3] https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md
A significant limitation to the gcc TLS extension is that if an initializer is present for a thread-local variable, it must be a constant-expression. [1] That means that we can't declare a thread-local variable that is a class instance with non-trivial construction and destruction.
Project Panama has a usecase for TLS that requires a non-trivial destructor for a C++ class, such that threads that attach to the JVM to process Java "upcalls" will be automatically detached when the thread terminates (if it didn't detach explicitly).
A discussion on the pros and cons of using C++ thread_local as the mechanism for TLS in the JVM, shows there are still a number of concerns that argue against its wholescale adoption. Some relevant extracts from that discussion:
"[A] reminder that the difference between C++11 thread_local and the gcc's __thread came up in the discussion of
https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-September/039487.html
That's what led us to the current restriction against using thread_local**. We could revisit that. thread_local usually requires an extra prologue before an access to ensure the variable has been initialized, while __thread requires the initializer be a constant expression. Also
---
** The issue here is a potential performance hit. As the gcc documentation describes it [2]:
"Unfortunately, this [C++ thread_local] support requires a run-time penalty for references to non-function-local thread_local variables defined in a different translation unit even if they don't need dynamic initialization, so users may want to continue to use __thread for TLS variables with static initialization semantics."
Some preliminary benchmarking with gcc __thread converted to C++ thread_local did show some significant regressions on a couple of benchmarks on Aarch64.
---
"thread_local has all the same initialization order issues as globals. There's a nicely worked out analysis here:
https://stackoverflow.com/questions/60813372/initialization-order-of-thread-local-vs-global-variables
So I think I'd like us to stick with the limited version that requires a constexpr initialization expression, at least for the most part."
---
"We could relax the prohibition to allow thread_local where really required. I might want a noisy looking macro for that use-case, with bare thread_local remaining forbidden. That makes it clear that someone thought about the question at least a little bit.
I looked for a way to warn about uses of thread_local that could be locally disabled where we intentionally use it, but didn't find such a thing. Clang (some version) has -Wglobal-constructors, and a patch exists for adding it to gcc, but it's not in gcc11.2 (the latest release).
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01860.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71482
But I did stumble over this. Might this be a problem for you?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61991"
---
For the record gcc bug 61991 is not an issue for the proposed use-case because the TLS variable does get used.
---
"There are two possible maintenance issues: 1. If we don't document our decisions about our choices we won't be able to re-evaluate them later on without full re-analysis, so let's put the info into the JBS entry. 2. Even if the choices are fully documented, there's some cost and risk in applying the documented reasoning correctly in each case, compared to a "one size fits all" design. But it seems like we have a plan to deal with those possible maintenance issues."
---
So the proposal here is to allow "well considered" uses of C++ thread_local, by providing a suitably "noisy" macro, and adjusting the Hotspot Style Guide [3] section on allowed C++ features to accommodate this.
[1] https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
[2] https://gcc.gnu.org/gcc-4.8/changes.html
[3] https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md
- blocks
-
JDK-8270851 Logic for attaching/detaching native threads could be improved
- Resolved
- relates to
-
JDK-8286891 thread_local causes undefined symbol error with XL C
- Resolved
-
JDK-8132510 Replace ThreadLocalStorage with compiler/language-based thread-local variables
- Resolved
-
JDK-8230877 Rename THREAD_LOCAL_DECL to THREAD_LOCAL
- Resolved
(1 links to)