Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8322630

Remove ICStubs and related safepoints

XMLWordPrintable

    • b10

      The inline caches can have different types of destinations, and they all require different subsets of data:
      1) Clean (destination: resolve stub, data: none)
      2) Monomorphic to_interpreter (destination: c2i adapter, data: CompiledICHolder wrapping the method and speculative receiver class)
      3) Monomorphic to_compiled (destination: nmethod UEP, data: speculative receiver class)
      4) Megamorphic vtable call (destination: vtable stub, data: none)
      5) Megamorphic itable call (destination: itable stub, data: CompiledICHolder wrapping REFC and DEFC classes for spec compliant itable lookup)

      Now when adapting the destination between these various shapes, we also need to adapt the data of the inline cache to match the expectations of the destination. In order to change both the destination and data parts of the inline caches atomically, we currently JIT-compile an ICStub that reflects the new data and destination. This allows us to change only the destination of the callsite to the new stub, without touching the data, yet having the effect of atomically updating both the data and destination at the same time.

      These ICStubs lead to what I refer to as "safepoint spam". The GuaranteedSafepointInterval that safepoints by default once per second is pretty much 1:1 coupled nowadays with freeing up any ICStub we might have compiled. Moreover, when a Java thread runs out of ICStubs it forces a synchronous safepoint. What we have seen happen occasionally, is that many threads find out at the same time (ish) that they need to refill ICStubs with a ICStubRefill safepoint. This leads to a sort of quadratic latency problem over the number of threads running, especially if someone holds the Threads_lock for a bit, allowing these safepoint requests to pile up.

      Looking at the possible combinations of destinations and data above, note two key things:
      1) All pieces of data needed by all possible state for a callsite, are known the first time the callsite is resolved. We know the speculative class (the receiver klass), the method we are invoking, and any REFC and DEFC classes for itable calls. We are only mutating the data because we are swiching around the data, instead of just sticking it all in there from the start.
      2) Among the destinations that consume data, there is only one that doesn't wrap it up in a struct: the monomorphic to_compiled state. Which takes us to the idea...

      The idea with this enhancement is to wrap up all possible data we will ever need for a callsite in a struct called CompiledICData, the first time the callsite is resolved. This contains the speculative class, the method it dispatches to, as well as any REFC and DEFC class for itable callsites. This way, we can throw away the ICStubs from the system, and all safepoints it comes with, as we never have to patch the data part of the inline cache from when an nmethod is compiled.

      The cost is that in the UEP of nmethods, we need to compare the receiver klass against a memory operand comprising an offset into the inline cache register, instead of comparing against the inline cache register instead. However, I could remove some instructions related to decoding the compressed receiver klass, since the CompiledICData speculative class is also compressed.

            eosterlund Erik Österlund
            eosterlund Erik Österlund
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: