-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
Affects Version/s: 25, 26, 27
-
Component/s: hotspot
(Synopsis is provisional, change as you see fit)
I have been studying one of internal benchmarks to understand the cost of Java casts to interfaces. And I think I found an interesting oddity inJDK-8180450 across the JDK releases. See attached MultiTypeBench.java -- the perfasm dumps are there, plus some analysis.
Here is the awkward result: JDK 21 is far ahead of both JDK 17 and JDK 25. It is plausible JDK 21 is far ahead of JDK 17 because ofJDK-8180450 on flip-flopping tests. But JDK 21 is also faster than JDK 25 and mainline in single interface query case! I believe it is an awkward interaction in C2 subtype checking code that *still* checks the single-slot secondary super cache, look at Phase::gen_subtype_check. In JDK 25, that single-slot check is always (?) failing, while in JDK 21 -- due to the absence of followup JDK-8331341 -- there is a chance that interpreter/C1 can still fill it in and thus enable a useful shortcut.
Arguably, this makes JDK 21 backport ofJDK-8180450 both better and worse. It is better, because for single-interface query case it matches the performance of JDK 17 pretty good. But if there is still flip-flop-ing on that single slot cache in interpreter/C1, C2 generated code would still hit the scalability bottleneck.
Arguably, now that JDK 21 is far ahead, one can say there is a palpable regression JDK 21 -> JDK 25 for single interface query case.
So, this shows a few opportunities.
1. Fix C2 to avoid touching single-slot secondary supers cache at all. There seems to be no reason to even check it in current mainline with UseSecondarySupersTable: we know all queries are handled by the table. This would save some cycles on the way to the real bitmap check. Does not solve apparent JDK 21 -> JDK 25 regression, but likely makes it less palpable.
2. Lean in and co-opt that single-slot for the ultra-fast case of single interface query, getting JDK 21-like performance. "Just" fill it in on first hit in secondary supers cache? The awkwardness with this solution would be uneven performance if that single-slot cache is poisoned by accident. We can _probably_ attempt to flush that cache every so often without re-introducing the scalability bottleneck.
I have been studying one of internal benchmarks to understand the cost of Java casts to interfaces. And I think I found an interesting oddity in
Here is the awkward result: JDK 21 is far ahead of both JDK 17 and JDK 25. It is plausible JDK 21 is far ahead of JDK 17 because of
Arguably, this makes JDK 21 backport of
Arguably, now that JDK 21 is far ahead, one can say there is a palpable regression JDK 21 -> JDK 25 for single interface query case.
So, this shows a few opportunities.
1. Fix C2 to avoid touching single-slot secondary supers cache at all. There seems to be no reason to even check it in current mainline with UseSecondarySupersTable: we know all queries are handled by the table. This would save some cycles on the way to the real bitmap check. Does not solve apparent JDK 21 -> JDK 25 regression, but likely makes it less palpable.
2. Lean in and co-opt that single-slot for the ultra-fast case of single interface query, getting JDK 21-like performance. "Just" fill it in on first hit in secondary supers cache? The awkwardness with this solution would be uneven performance if that single-slot cache is poisoned by accident. We can _probably_ attempt to flush that cache every so often without re-introducing the scalability bottleneck.
- caused by
-
JDK-8180450 secondary_super_cache does not scale well
-
- Resolved
-