Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Duplicate
Priority: P4
Fix Version/s: tbd
Affects Version/s: 16
Component/s: hotspot
Labels:
None

Subcomponent:
compiler

Searching Klass::secondary_supers is slower than it should be,
and this shows up as a performance problem with some specialized
workloads.

There are two sources of slowness:

1. We use the SCASQ instruction on x86 to scan the supers array.
We should put in a platform-specific option to use a plain loop
(MOVQ/CMPQ/JCCNE) instead, because SCASQ may not
be well-supported on newer platforms.

2. Rarely, a class might have a very long secondary_supers array,
*and* tests against it (either positive/unstable or negative) might
be in the application's set of hot operations. In such applications,
the O(N) linear search of secondary_supers appears in an ugly way.

Suggested improvements (in order):

A. Supply an option to replace SCASQ, maybe turn it on by default
on newer systems.

B. Sort the items in the array by metadata address (they don't
move anymore, unless maybe when classes are redefined).
Re-sort when they might change (after CDS, for example).
Use an out-of-line binary search routine to traverse the array.
Use the inline code (from A) for small arrays, certainly of
size 0 or 1, to avoid useless branching.

C. Add a negative test cache, managed similarly to the
secondary_super_cache (see ~~JDK-8180450~~) to avoid
searching the whole array on misses which are predictable.

This would help with long decision chains (reported with Scala)
which look like this:

   //Class<?> T0 = x.getClass();
   if (x instanceof T1) doT1(x); // usually not taken after O(N) search!
   if (x instanceof T2) doT2(x); // usually not taken after O(N) search!
   if (x instanceof T3) doT3(x); // usually not taken after O(N) search!
   if (x instanceof T4) doT4(x); // OK, maybe it's a T4
   ... // or maybe not, so more O(N) searches...

Here, it might be useful to store on T1 and T2 and T3
that there's a frequent query of them for T0, and note
that T0 is a "guaranteed no". Such a cache should have
multiple entries (and thus spin up only if hot) to cover
cases where T0 varies frequently.

Or, it might be better to store on T0 that T1 and T2 and T3
are all "guaranteed no". This requires enough cache entries
on T0 to cover the depth of the decision chain given above.
Since the length of the chain could be very long (like the
number of frequent T0 values in the previous paragraph),
the cache should size itself to the heat of the testing.

Or (third try, and maybe the best), there could be a global
negative cache indexed by pairs of T0/Ti. It could be fixed
in size and leaky, in the sense that interfering queries would
evict each other (after some hysteresis to avoid memory
thrashing, as in ~~JDK-8180450~~).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Fast_Subtype_Checks_20240206.pdf
262 kB
2024-02-06 11:21

duplicates

JDK-8180450 secondary_super_cache does not scale well

Resolved

Assignee:: Andrew Haley

Reporter:: John Rose

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2020-08-07 16:38

Updated:: 2024-04-21 23:38

Resolved:: 2024-04-21 23:38

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates