Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8251318

search of secondary_supers does not scale well

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Duplicate
    • Icon: P4 P4
    • tbd
    • 16
    • hotspot
    • None

      Searching Klass::secondary_supers is slower than it should be,
      and this shows up as a performance problem with some specialized
      workloads.

      There are two sources of slowness:

      1. We use the SCASQ instruction on x86 to scan the supers array.
      We should put in a platform-specific option to use a plain loop
      (MOVQ/CMPQ/JCCNE) instead, because SCASQ may not
      be well-supported on newer platforms.

      2. Rarely, a class might have a very long secondary_supers array,
      *and* tests against it (either positive/unstable or negative) might
      be in the application's set of hot operations. In such applications,
      the O(N) linear search of secondary_supers appears in an ugly way.

      Suggested improvements (in order):

      A. Supply an option to replace SCASQ, maybe turn it on by default
      on newer systems.

      B. Sort the items in the array by metadata address (they don't
      move anymore, unless maybe when classes are redefined).
      Re-sort when they might change (after CDS, for example).
      Use an out-of-line binary search routine to traverse the array.
      Use the inline code (from A) for small arrays, certainly of
      size 0 or 1, to avoid useless branching.

      C. Add a negative test cache, managed similarly to the
      secondary_super_cache (see JDK-8180450) to avoid
      searching the whole array on misses which are predictable.

      This would help with long decision chains (reported with Scala)
      which look like this:

         //Class<?> T0 = x.getClass();
         if (x instanceof T1) doT1(x); // usually not taken after O(N) search!
         if (x instanceof T2) doT2(x); // usually not taken after O(N) search!
         if (x instanceof T3) doT3(x); // usually not taken after O(N) search!
         if (x instanceof T4) doT4(x); // OK, maybe it's a T4
         ... // or maybe not, so more O(N) searches...

      Here, it might be useful to store on T1 and T2 and T3
      that there's a frequent query of them for T0, and note
      that T0 is a "guaranteed no". Such a cache should have
      multiple entries (and thus spin up only if hot) to cover
      cases where T0 varies frequently.

      Or, it might be better to store on T0 that T1 and T2 and T3
      are all "guaranteed no". This requires enough cache entries
      on T0 to cover the depth of the decision chain given above.
      Since the length of the chain could be very long (like the
      number of frequent T0 values in the previous paragraph),
      the cache should size itself to the heat of the testing.

      Or (third try, and maybe the best), there could be a global
      negative cache indexed by pairs of T0/Ti. It could be fixed
      in size and leaky, in the sense that interfering queries would
      evict each other (after some hysteresis to avoid memory
      thrashing, as in JDK-8180450).

            aph Andrew Haley
            jrose John Rose
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: