Details
-
Bug
-
Resolution: Fixed
-
P3
-
9, 10, 11, 12, 13, 17, 20, 21, 22
-
b19
Description
cause excessive cache line invalidation traffic, with noticeable slowdowns.
Specifically, the cache itself may become unstable (which is a normal corner case for one-element caches) and at that point a multi-threaded application may begin "hammering" on the cache line from multiple threads, causing an explosion of coherence traffic.
One customer reported this as happening when multiple threads were traversing heterogeneous sequences of objects, testing the same classes against more than one interface, with rapid variation between the interfaces.
In such a case, two interfaces could compete to use the single SSC slot on each class that occurs in the object sequence. The competition would turn into frequent updating of the SSC slots by multiple threads, causing cache lines to ping-pong between processors.
To fix this, the SSC has to have some sort of limit on its update rate, or be replaced by a mechanism that scales better.
The simplest fix is probably to put an "update count" profile counter somewhere, and consult that counter just before updating the SSC. If the counter is too high (evidence of a high contention rate), don't update the SSC. The trade-off is between linear searches of the Klass::secondary_supers array (which is stable and therefore replicated across caches) versus time spent waiting to acquire write access to the SSC (which may be hundreds of cycles). Linear search will easily win in those cases, except of course for very dense dynamic query mixes over very complex interface graphs, which is a corner case we can leave for the future.
The obvious place to put the update count is next to the SSC, on the same cache line. When the miss count overflows past some selected threshold, the SSC is left unchanged. On balance the extra footprint of a 32-bit field per Klass seems acceptable.
Such a counter should be allowed to decay, so that temporary bursts in type test complexity do not shut down the SSC forever.
Another possible fix would be a thread-local update counter for the SSC, under JavaThread::current. In that case, only Java code could use the extra fix to avoid cache contention, but that is probably acceptable also. This fix would be significantly more complex, but would have the benefit that only "offending" threads would throttle themselves.
Similarly, the counter could be placed in the MethodData object which carries the profile of the instruction which is causing the SSC contention. (This instruction could be instanceof, checkcast, aastore, or a call to an intrinsic method that emulates one of those.) This fix would be even more complex than the thread-based fix, and would probably be overkill given the relatively small importance of the problem.
If the secondary_supers lists ever grow in length to more than a few tens of elements, additional mechanisms may be needed for quickly testing the subtype relation. Probably a tree walk would be sufficient. Sometimes unified caches (global or thread-local) are proposed, or perhaps unified numbering schemes, but those, also, seem overkill for this problem.
Attachments
Issue Links
- duplicates
-
JDK-8251318 search of secondary_supers does not scale well
- Closed
- relates to
-
JDK-8334220 Optimize Klass layout after JDK-8180450
- Resolved
-
JDK-8339916 AIOOBE due to Math.abs(Integer.MIN_VALUE) in tests
- Open
-
JDK-8331117 [PPC64] secondary_super_cache does not scale well
- Resolved
-
JDK-8332228 TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache'
- Resolved
-
JDK-8331341 secondary_super_cache does not scale well: C1 and interpreter
- Open
-
JDK-8151481 j.u.regex.Pattern cleanup
- Resolved
-
JDK-8332498 [aarch64, x86] improving OpToAssembly output for partialSubtypeCheckConstSuper Instruct
- Resolved
-
JDK-8331126 [s390x] secondary_super_cache does not scale well
- Resolved
-
JDK-8331159 VM build without C2 fails after JDK-8180450
- Resolved
-
JDK-8332604 InlineSecondarySupersTest only available in C2
- Closed
-
JDK-8331341 secondary_super_cache does not scale well: C1 and interpreter
- Open
-
JDK-8332587 RISC-V: secondary_super_cache does not scale well
- Resolved
-
JDK-8316180 Thread-local backoff for secondary_super_cache updates
- Closed
- links to
-
Commit openjdk/jdk/f11a496d
-
Review openjdk/jdk/18309
-
Review(master) openjdk/jdk22u/166