Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8255763

C2: OSR miscompilation caused by invalid memory instruction placement

    XMLWordPrintable

Details

    • b30

    Backports

      Description

        FAILURE:

        The attached program Count.java (the most recent version, for some reason I
        cannot remove the old one) is miscompiled by C2:

        # Interpreted.
        $ java -Xint Count.java
        count: 0
        count: 1
        count: 2
        count: 3

        # Compiled with C1.
        java -Xcomp -XX:CompileOnly=Count Count.java
        count: 0
        count: 1
        count: 2
        count: 3

        # Compiled with C2: count jumps from 2 to 4.
        java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Count Count.java
        count: 0
        count: 1
        count: 2
        count: 4

        ANALYSIS:

        The problem is introduced by an OSR compilation that triggers after "count: 2"
        (entering right before the innermost loop). In the miscompiled program, the
        load, increment, and store instructions corresponding to the "count++" statement
        are wrongly moved into a loop header and end up being executed on two of the
        three switch cases. The wrong placement can be seen in the attached CFG
        (count-actual.pdf) generated after global code motion (some optimizations are
        disabled for simplicity): nodes 113, 112, and 110 implement the "count++"
        statement and are wrongly placed into block B8 (the header of loop L) instead of
        the preheader block B7. This bug is due to the combination of two underlying
        problems in C2.

        Problem 1

        The placement is wrongly allowed by the Ideal representation since 1) B7 and B8
        are control equivalent (B7 dominates B8 and B8 postdominates B7) and 2) there is
        no memory effect within L (and thus no phi instruction acting as a barrier in
        L's header B8). Hence, global code motion wrongly considers B8 as a candidate
        for placement of {113, 112, 110}. In general, there does not seem to exist any
        mechanism in C2's intermediate representation to avoid moving "effectful" memory
        instructions into the header block of "effectless" loops.

        Problem 2

        Typically, global code motion favors placing nodes outside of loops, which
        explains why Problem 1 does not often cause miscompilations in practice.
        However, in the OSR compilation of Count.java, B8 has a lower estimated
        execution frequency than B7 (as reflected in the red-blue color scale in
        count-actual.pdf), which leads to the placement of {113, 112, 110} into B8.
        Since B8 is the single successor of B7, it should have at least the same
        estimated execution frequency.

        The misestimation is due to the assumption in CFGLoop::compute_freq() that the
        underlying CFG is reducible, which is not the case for the OSR compilation of
        Count.java. More specifically, compute_freq() computes the frequencies of each
        block within a loop L in a single forward pass (in reverse DFS postorder) over
        the members (blocks and loops) of L [1], assuming that the members form an
        acyclic graph (except for L's back-edges). This assumption does not hold for
        irreducible graphs, where there might be additional cycles corresponding to
        non-natural loops. The result in this case is that the frequencies of {B8, B9,
        B10, B11, B12, B13, B14, B16, B18, B20} are updated "too early" (before the
        frequency of B7 is updated with that of B22) and never revisited.

        A possible solution to Problem 2 is to transform the single forward pass into a
        proper system of equations to be solved iteratively (see e.g. [2]), as in the
        following Python/SymPy snippet:

        from sympy import *
        b3, l3, b5, b6, b7, l2, b14, b16, b18, b20, b21, b22, b23, b25 = symbols(['b3', 'l3', 'b5', 'b6', 'b7', 'l2', 'b14', 'b16', 'b18', 'b20', 'b21', 'b22', 'b23', 'b25'])
        system = [
            Eq(b3, 1),
            Eq(l3, b3 * 1),
            Eq(b5, l3 * 1),
            Eq(b6, b5 * 0),
            Eq(b7, b6 * 1 + b22 * 1),
            Eq(l2, b7 * 1),
            Eq(b14, l2 * 0.666),
            Eq(b16, b14 * 1),
            Eq(b18, b16 * 1),
            Eq(b20, b18 * 1),
            Eq(b21, b25 * 1 + b20 * 1),
            Eq(b22, b21 * 0.02),
            Eq(b23, b21 * 0.98),
            Eq(b25, b5 * 1),
            ]
        soln = solve(system, [b3, l3, b5, b6, b7, l2, b14, b16, b18, b20, b21, b22, b23, b25])
        print(soln)
        {b3: 1.00000000000000, l3: 1.00000000000000, b5: 1.00000000000000, b6: 0.0, b7: 0.0202699963514007, l2: 0.0202699963514007, b14: 0.0134998175700328, b16: 0.0134998175700328, b18: 0.0134998175700328, b20: 0.0134998175700328, b21: 1.01349981757003, b22: 0.0202699963514007, b23: 0.993229821218632, b25: 1.00000000000000}

        Using these frequencies instead of those computed by C2 results in global code
        motion placing 'count++' into B7 (see count-expected.pdf), and the expected
        output being produced by Count.java.

        An alternative solution could be to simply ensure that all CFGs are reducible by
        applying node-splitting before frequency computation (perhaps other
        optimizations could also benefit from that?).

        [1] https://github.com/openjdk/jdk/blob/4a267f1bc2b025aae2cb9df7283156aeb0282406/src/hotspot/share/opto/gcm.cpp#L1790

        [2] https://dl.acm.org/doi/10.1145/178243.178251 (Section 5).

        ORIGINAL REPORT FROM FUZZER TEST:

        The attached fuzzer test produces a different result for C2 compared to C1/interpreter.

        To reproduce:
        $ java -Xint Test.java > Xint.log
        $ java -Xmx1G -Xcomp -Xbatch -XX:-TieredCompilation -XX:CompileOnly=Test Test.java > c2.log

        $ diff Xint.log c2.log
        13c13
        < i21 i22 i23 = -4427127,2,66
        ---
        > i21 i22 i23 = -50828706,2,66
        17,18c17,18
        < Test.iFld byFld Test.fFld = -4427127,-77,1252881271
        < Test.iFld1 Test.bArrFld iArrFld = -22136,13534,-43010
        ---
        > Test.iFld byFld Test.fFld = -76243092,-77,1252384471
        > Test.iFld1 Test.bArrFld iArrFld = -63536,13534,-43010
        28c28
        < Test.iFld byFld Test.fFld = -1140077,66,1258518506
        ---
        > Test.iFld byFld Test.fFld = -1140077,66,1258249012
        35c35
        < i21 i22 i23 = -4428333,2,66
        ---
        > i21 i22 i23 = -50833530,2,66
        39,40c39,40
        < Test.iFld byFld Test.fFld = -4428333,-77,1261269881
        < Test.iFld1 Test.bArrFld iArrFld = -22142,13534,-86020
        ---
        > Test.iFld byFld Test.fFld = -76250328,-77,1260607481
        > Test.iFld1 Test.bArrFld iArrFld = -63542,13534,-86020
        50c50
        < Test.iFld byFld Test.fFld = -1138674,66,1264202151
        ---
        > Test.iFld byFld Test.fFld = -1138674,66,1263539751
        57c57
        < i21 i22 i23 = -4440730,2,66
        ---
        > i21 i22 i23 = -50827918,2,66
        61,62c61,62
        < Test.iFld byFld Test.fFld = -4440730,-77,1266817300
        < Test.iFld1 Test.bArrFld iArrFld = -22204,13534,-129030
        ---
        > Test.iFld byFld Test.fFld = -76241910,-77,1265877126
        > Test.iFld1 Test.bArrFld iArrFld = -63535,13534,-129030
        64,65c64,65
        < iMeth_check_sum: -5029668300147928
        < vMeth_check_sum: -607986478366232936
        ---
        > iMeth_check_sum: -5029668300160304
        > vMeth_check_sum: -607986478366244874
        68c68
        < i21 i22 i23 = -1149871,2,-77
        ---
        > i21 i22 i23 = -1151077,2,-77
        71,73c71,73
        < Test.instanceCount Test.dFld Test.bFld = 6,-4591997026765212379,0
        < Test.iFld byFld Test.fFld = -1149871,66,1268283435
        < Test.iFld1 Test.bArrFld iArrFld = -5749,13534,-158758
        ---
        > Test.instanceCount Test.dFld Test.bFld = 0,-4591997026765212379,0
        > Test.iFld byFld Test.fFld = -1151077,66,1267744602
        > Test.iFld1 Test.bArrFld iArrFld = -5755,13534,-158758
        75,76c75,76
        < iMeth_check_sum: -5867946350171353
        < vMeth_check_sum: 8514054478760838568
        ---
        > iMeth_check_sum: -5867946350186585
        > vMeth_check_sum: 8514054478760823836
        83c83
        < Test.iFld byFld Test.fFld = -4438334,-77,1269659754
        ---
        > Test.iFld byFld Test.fFld = -4438334,-77,1269120922
        86,87c86,87
        < iMeth_check_sum: -6706224400191446
        < vMeth_check_sum: -810648637821638496
        ---
        > iMeth_check_sum: -6706224400206202
        > vMeth_check_sum: -810648637821652847
        94c94
        < Test.iFld byFld Test.fFld = -1150675,66,1271125889
        ---
        > Test.iFld byFld Test.fFld = -1150675,66,1270587057
        97,98c97,98
        < iMeth_check_sum: -7544502450215823
        < vMeth_check_sum: 8311392319305431865
        ---
        > iMeth_check_sum: -7544502450230579
        > vMeth_check_sum: 8311392319305417514
        105c105
        < Test.iFld byFld Test.fFld = -4440730,-77,1272502208
        ---
        > Test.iFld byFld Test.fFld = -4440730,-77,1271963376
        108,109c108,109
        < iMeth_check_sum: -8382780500236868
        < vMeth_check_sum: -1013310797277046215
        ---
        > iMeth_check_sum: -8382780500251624
        > vMeth_check_sum: -1013310797277060566

        Attachments

          1. Test.java
            9 kB
          2. FuzzerUtils.java
            13 kB
          3. Count.java
            0.7 kB
          4. Count.java
            0.7 kB
          5. count-actual.pdf
            47 kB
          6. count-expected.pdf
            47 kB

          Issue Links

            Activity

              People

                rcastanedalo Roberto Castaneda Lozano
                chagedorn Christian Hagedorn
                Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: