Loading...

Type: Enhancement
Resolution: Unresolved
Priority: P3
Fix Version/s: tbd
Affects Version/s: 21
Component/s: hotspot
Labels:
None

Subcomponent:
compiler

In previous work, ~~JDK-8300002~~ added a mechanism to denote small bits of code as irrelevant to the overall complexity of a compiled method (“nmethod”). This allowed Loom to add a few extra instructions around call sites without perturbing the InlineSmallCode heuristic.

(The InlineSmallCode is notoriously unstable and non-linear. It provides a useful service to avoid certain inlining pathologies, but it has become entrenched as a part of Hotspot’s performance model. Removing it or replacing it by a more rational heuristic would cause unpredictable effects on application performance almost everywhere. We may have to "bite the bullet" and do this some day, but there is also benefit to living with it by reducing its downsides. The technique of ~~JDK-8300002~~ is a good example of this. This RFE proposes another.)

The code assembly process is capable of managing several code streams at once, which are merged as “sections” in the nmethod eventually produced by a compilation task. These sections are loosely coupled; each one does a different job.

This RFE proposes a new section for “slow path" code. The section would be a good container for generated code which is intended to be used infrequently to handle “corner cases”. Many kinds of slow paths exist, including unreached statements, TLB overflow/refill, type speculation failure, loop predication failure, and other things. Code which reconfigures a computation on a slow path may throw an exception, trap out to a lower tier (the interpreter), or perform some computation and branch to detuned code (a backup loop body). In most cases (not all) there is a discernable section of code which maybe very complicated, and yet does not and should not impact the overall estimated complexity of the nmethod, since it does not execute very often (say, less than 1 time in 10,000 invocations).

That code should go into the new section, and should not contribute to the InlineSmallCode heuristic. As a separate section, seldom executed, it will also not contribute to instruction cache loading. (This is a flaw with “hop over” slow paths as supported by ~~JDK-8300002~~; they *do* tend to fragment and overload the instruction cache.)

As an added value, the assembler can inject counter increments systematically on entries into the slow-path section, and these may well be useful as inputs to recompilation heuristics.

The implementation of this RFE probably requires something like a InlineSkippedInstructionsCounter gadget, but one which refocuses the assembler on the slow-path section, instead of just collecting the size of an exclusion zone. (Again, if this feature is overused, instruction cache fragmentation will result. The present RFE is intended to prevent that.)

Also required will be some sort of assembler relocation which is competent to manage jump offsets between the main (“fast path”) section and the slow path section.

Such slow-path sections would be useful in some cases of instruction (AD file) templates where the instruction contains a slow-path, but is required by packaging constraints to use the “hop over” pattern, fragmenting instruction cache.

Compilers, especially C2, will wish to annotate IR with frequency information that can then be used to make decisions about scheduling code to hot or slow paths. (They already do this, in fact, but put everything in one section, scheduling slow stuff far away from fast stuff.) There are many places in C2 that collect slow-path, notably the BuildCutout gadget.

We should route new kinds of slow paths to the new section, to avoid “poking the bear” of InlineSmallCode.

We should route existing slow paths currently hidden under InlineSkippedInstructionsCounter to the new feature, if they are truly slow. (An embedded nop after a call is not truly slow. A rare call to a cleanup routine is slow, if rare enough. There may be a platform-specific knob to help adjust these decisions.)

We should also, under a flag, route selected pre-existing slow paths through the new section. This includes the C2 paths mentioned above, when they are truly slow. The flag is necessary to control potentially unstable interactions with InlineSmallCode. We should try to turn it on.

Potentially, slow paths could be given human-readable labels (at the assembler level). This would allow us to turn on profiling/debugging code to count those slow paths in a systematic way. Such a feature might well be useful in the field for diagnosing performance puzzles, so it should be a diagnostic flag. Note that, as designed in this RFE, such a feature would not perturb the inlining heuristics, and hence should have low performance overhead, as befits a performance measurement tool.

relates to

JDK-8306706 Support out-of-line code generation for MachNodes

Resolved

JDK-8300002 Performance regression caused by non-inlined hot methods due to post call noop instructions

Resolved

Details

Description

Attachments

Issue Links

Activity

People

Dates