There should be a JVM command line setting (perhaps an -Xlog option) that will report when a virtual call has failed to devirtualize, and that call is (somehow) known to be significant to performance.
There might be sub-options which filter the potential output so that only the hottest code paths are considered, and/or only paths in certain classes and/or methods. (Some -Xlog messages can be optionally restricted to particular named locations; perhaps this filtering capability is relevant to this feature.)
Perhaps failure of inlining is the more important feature, but note that inlining requires devirtualization, where a polymorphic call site is reduced by profiling and/or type inference to monomorphic (sometimes N-morphic, with a dynamic guard).
The point is that virtual calls, especially interface calls, are usually undesirable in a hot path in an application. Inlining is usually required to get desirable performance, and if that fails, sometimes an API, library, or algorithm must be redesigned. Low-level performance "potholes" (or "cliffs") often happen because devirtualization fails. (This is even after normal algorithmic improvements are made, such as adding of caches or indexes, or manual hoisting of checks, or upgrades to brute-force searches.) But usually the only evidence we have of such failures is a benchmark slowdown. It can take weeks of tinkering with code and examining assembly outputs and hardware profiles to decide where is the root cause of the slowdown. Again, the slowdown often ends up being caused by an inlining failure, but finding the offending method can be very, very difficult.
(Fixing an inlining failure can also be very difficult; generally speaking the required dynamic type information must be more forcibly injected into the program structure in such a way that that the JIT can "see" it and treat it as a statically reliable quantity. Then the method dispatch folds up, then the method body folds up, and everybody can go work on something else.)
I think if we had a way to pinpoint hot code paths that failed to inline, and then were able to report contextual information about those paths, programmers could save some of those weeks of machine code analysis, now required to find the bottleneck.
The simplest prototype for this would be some conditional logic in C2 which would print a log message on inlining failure. It should have a threshold parameter, which says "if this failed inlining has apparently executed N times in the past, let's report it".
A much more elaborate feature, and a more useful one, would somehow associate profile points with the compiled machine code of each virtual call emitted by C2. (Perhaps it would somehow distinguish virtual calls which are "obviously always slow" from those that "we really should optimize". We could try various heuristics for that; maybe AI can help here at some point to separate the wheat from the chaff.)
At the end of the application run, the profiled calls would be sorted from hottest (the worst offenders) to coldest (don't care), and an appropriate report logged on VM output.
There might be sub-options which filter the potential output so that only the hottest code paths are considered, and/or only paths in certain classes and/or methods. (Some -Xlog messages can be optionally restricted to particular named locations; perhaps this filtering capability is relevant to this feature.)
Perhaps failure of inlining is the more important feature, but note that inlining requires devirtualization, where a polymorphic call site is reduced by profiling and/or type inference to monomorphic (sometimes N-morphic, with a dynamic guard).
The point is that virtual calls, especially interface calls, are usually undesirable in a hot path in an application. Inlining is usually required to get desirable performance, and if that fails, sometimes an API, library, or algorithm must be redesigned. Low-level performance "potholes" (or "cliffs") often happen because devirtualization fails. (This is even after normal algorithmic improvements are made, such as adding of caches or indexes, or manual hoisting of checks, or upgrades to brute-force searches.) But usually the only evidence we have of such failures is a benchmark slowdown. It can take weeks of tinkering with code and examining assembly outputs and hardware profiles to decide where is the root cause of the slowdown. Again, the slowdown often ends up being caused by an inlining failure, but finding the offending method can be very, very difficult.
(Fixing an inlining failure can also be very difficult; generally speaking the required dynamic type information must be more forcibly injected into the program structure in such a way that that the JIT can "see" it and treat it as a statically reliable quantity. Then the method dispatch folds up, then the method body folds up, and everybody can go work on something else.)
I think if we had a way to pinpoint hot code paths that failed to inline, and then were able to report contextual information about those paths, programmers could save some of those weeks of machine code analysis, now required to find the bottleneck.
The simplest prototype for this would be some conditional logic in C2 which would print a log message on inlining failure. It should have a threshold parameter, which says "if this failed inlining has apparently executed N times in the past, let's report it".
A much more elaborate feature, and a more useful one, would somehow associate profile points with the compiled machine code of each virtual call emitted by C2. (Perhaps it would somehow distinguish virtual calls which are "obviously always slow" from those that "we really should optimize". We could try various heuristics for that; maybe AI can help here at some point to separate the wheat from the chaff.)
At the end of the application run, the profiled calls would be sorted from hottest (the worst offenders) to coldest (don't care), and an appropriate report logged on VM output.