Secondary supers cache slot is still used in C2

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Unresolved
    • Priority: P4
    • tbd
    • Affects Version/s: 25, 26, 27
    • Component/s: hotspot

      (Synopsis is provisional, change as you see fit)

      I have been studying one of internal benchmarks to understand the cost of Java casts to interfaces. And I think I found an interesting oddity in JDK-8180450 across the JDK releases. See attached MultiTypeBench.java -- the perfasm dumps are there, plus some analysis.

      Here is the awkward result: JDK 21 is far ahead of both JDK 17 and JDK 25. It is plausible JDK 21 is far ahead of JDK 17 because of JDK-8180450 on flip-flopping tests. But JDK 21 is also faster than JDK 25 and mainline in single interface query case! I believe it is an awkward interaction in C2 subtype checking code that *still* checks the single-slot secondary super cache, look at Phase::gen_subtype_check. In JDK 25, that single-slot check is always (?) failing, while in JDK 21 -- due to the absence of followup JDK-8331341 -- there is a chance that interpreter/C1 can still fill it in and thus enable a useful shortcut.

      Arguably, this makes JDK 21 backport of JDK-8180450 both better and worse. It is better, because for single-interface query case it matches the performance of JDK 17 pretty good. But if there is still flip-flop-ing on that single slot cache in interpreter/C1, C2 generated code would still hit the scalability bottleneck.

      Arguably, now that JDK 21 is far ahead, one can say there is a palpable regression JDK 21 -> JDK 25 for single interface query case.

      So, this shows a few opportunities.

      1. Fix C2 to avoid touching single-slot secondary supers cache at all. There seems to be no reason to even check it in current mainline with UseSecondarySupersTable: we know all queries are handled by the table. This would save some cycles on the way to the real bitmap check. Does not solve apparent JDK 21 -> JDK 25 regression, but likely makes it less palpable.

      2. Lean in and co-opt that single-slot for the ultra-fast case of single interface query, getting JDK 21-like performance. "Just" fill it in on first hit in secondary supers cache? The awkwardness with this solution would be uneven performance if that single-slot cache is poisoned by accident. We can _probably_ attempt to flush that cache every so often without re-introducing the scalability bottleneck.

            Assignee:
            Unassigned
            Reporter:
            Aleksey Shipilev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: