Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Unresolved
Priority: P4
Fix Version/s: tbd
Affects Version/s: 17
Component/s: hotspot
Labels:
- c2

Subcomponent:
compiler

CFGLoop::compute_freq() assumes that the underlying CFG is reducible. Under this assumption, it computes the frequencies of each block within a loop L in a single forward pass (in reverse DFS postorder) over the members (blocks and loops) of L [1], assuming that the members form an acyclic graph (except for L's back-edges). This assumption does not hold for irreducible graphs, where there might be additional cycles corresponding to non-natural loops.

This leads to inaccurate frequency estimations for irreducible CFGs. For example, in the attached irreducible CFG count-actual.pdf (the red-blue color scale reflects estimated frequency), B8 has a lower estimated execution frequency than B7, but since B8 is the single successor of B7, it should have at least the same estimated execution frequency. This inaccuracy is due to the single forward pass: the frequencies of {B8, B9, B10, B11, B12, B13, B14, B16, B18, B20} are updated "too early" (before the frequency of B7 is updated with that of B22) and never revisited.

A possible solution is to transform the single forward pass into a proper system of equations to be solved iteratively (see e.g. [2]), as in the following Python/SymPy snippet:

from sympy import *
b3, l3, b5, b6, b7, l2, b14, b16, b18, b20, b21, b22, b23, b25 = symbols(['b3', 'l3', 'b5', 'b6', 'b7', 'l2', 'b14', 'b16', 'b18', 'b20', 'b21', 'b22', 'b23', 'b25'])
system = [
    Eq(b3, 1),
    Eq(l3, b3 * 1),
    Eq(b5, l3 * 1),
    Eq(b6, b5 * 0),
    Eq(b7, b6 * 1 + b22 * 1),
    Eq(l2, b7 * 1),
    Eq(b14, l2 * 0.666),
    Eq(b16, b14 * 1),
    Eq(b18, b16 * 1),
    Eq(b20, b18 * 1),
    Eq(b21, b25 * 1 + b20 * 1),
    Eq(b22, b21 * 0.02),
    Eq(b23, b21 * 0.98),
    Eq(b25, b5 * 1),
    ]
soln = solve(system, [b3, l3, b5, b6, b7, l2, b14, b16, b18, b20, b21, b22, b23, b25])
print(soln)
{b3: 1.00000000000000, l3: 1.00000000000000, b5: 1.00000000000000, b6: 0.0, b7: 0.0202699963514007, l2: 0.0202699963514007, b14: 0.0134998175700328, b16: 0.0134998175700328, b18: 0.0134998175700328, b20: 0.0134998175700328, b21: 1.01349981757003, b22: 0.0202699963514007, b23: 0.993229821218632, b25: 1.00000000000000}

count-expected.pdf shows the same CFG (and the corresponding result of GCM) using the accurate frequencies.

The above system of equations can be computed by reiterating the forward pass until a fix point is reached. This algorithm would require two passes for the vast majority of cases (reducible CFGs), and only suffer a slow down for irreducible CFGs, which could be mitigated with a limit on the number of iterations and a specified tolerance.

An alternative, wider solution could be to simply enforce reducibility through C2 by either bailing out on irreducible CFGs or transforming them into reducible ones by node-splitting or dispatching.

[1] https://github.com/openjdk/jdk/blob/4a267f1bc2b025aae2cb9df7283156aeb0282406/src/hotspot/share/opto/gcm.cpp#L1790
[2] https://dl.acm.org/doi/10.1145/178243.178251 (Section 5).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

count-actual.pdf
47 kB
2020-12-23 05:31
count-expected.pdf
47 kB
2020-12-23 05:31

relates to

JDK-8255763 C2: OSR miscompilation caused by invalid memory instruction placement

Resolved

JDK-8257146 C2: extend the scope of StressGCM

Closed

Assignee:: Unassigned
Reporter:: Roberto Castaneda Lozano
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: 2020-12-23 05:46
Updated:: 2020-12-23 07:39

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates