Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8279622

C2: miscompilation of map pattern as a vector reduction

    XMLWordPrintable

Details

    • b21

    Backports

      Description

        SLP wrongly vectorizes a loop as a reduction instead of a simple map pattern. SLP believes the loop forms a reduction pattern because its operations are earlier marked as reduction nodes (by PhaseIdealLoop::mark_reductions), however they are marked as such within a *different* loop that is removed by in-between loop transformations.

        HOW TO REPRODUCE

        $ java -ea Fail.java (using JDK 17, 18, or 19 up to b11)

        FAILURE ANALYSIS

        Using Fail.java as an example (run with -XX:-PartialPeelLoop for simplicity), the sequence of events is (roughly) as follows:

        Original loop before loop optimizations (N, M, and Fail.mask are constants):

           for (int i = 0; i < N; i++) {
             for (j = 0; j < M; j++) {
               r[i] ^= Fail.mask;
             }
           }

        1. The inner loop is marked as a reduction together with its XOR operation:

           for (int i = 0; i < N; i++) {
             for (j = 0; j < M; j++) { // loop marked as a reduction
               r[i] ^= Fail.mask; // XOR marked as a reduction
             }
           }

        2. The inner loop is split into a peeled iteration, main, and post loop and unrolled twice:

           for (int i = 0; i < N; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction (inconsistent, outer loop is not a reduction!)
             int j = 0;
             for (...; j+=2) { // loop marked as a reduction
               r[i] ^= Fail.mask; // XOR marked as a reduction
               r[i] ^= Fail.mask; // XOR marked as a reduction
             }
             for (...; j++) { // loop marked as a reduction
               r[i] ^= Fail.mask; // XOR marked as a reduction
             }
           }

        3. the inner main and post loops are found to be redundant (due to the "self-inversion" property of XOR with a constant operand) and get removed:

           for (int i = 0; i < N; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }

        4. the outer loop is further optimized into its final version, where the main loop is unrolled four times for SLP vectorization:

           int i = 0;
           for (...; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }
           for (...; i+=4) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
             r[i] ^= Fail.mask; // XOR marked as a reduction
             r[i] ^= Fail.mask; // XOR marked as a reduction
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }
           for (...; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }

        5. the main loop is wrongly vectorized as a reduction due to its XOR operations being marked as reductions:

           int i = 0;
           for (...; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }
           for (...; i+=4) {
             tmp = reduce(XOR, Fail.mask, r[i...i+3])
             r[i...i+3] = [tmp, tmp, tmp, tmp]
           }
           for (...; i++) {
             r[i] ^= Fail.mask; // XOR marked as a reduction
           }

        The expected main loop vectorization is:
           ...
           for (...; i+=4) {
             r[i...i+3] = map(XOR, r[i...i+3], [Fail.mask, Fail.mask, Fail.mask, Fail.mask])
           }
           ...

        Note that this failure is only reproducible in up to JDK 19 b11. In JDK 19 b12, JDK-8154302 introduces a safepoint poll in the (counted) outer-main loop (see step 4 above), which inhibits SLP vectorization ("SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head"). The root cause of the failure (a reduction node within a non-reduction loop) remains present though.

        ORIGINAL REPORT:

        The attached fuzzer test produces a different result for C2 compared to C1/interpreter.

        To reproduce (on JDK 17, JDK18, and JDK19):
        $ java -Xint Test.java > int.log
        $ java Test.java > c2.log
        $ diff int.log c2.log
        55c55
        < iArr3 = -4168
        ---
        > iArr3 = -204359
        67c67
        < iArr3 = -4168
        ---
        > iArr3 = -195060

        # To reproduce on JDK 17, JDK 18 (but not on JDK19 commit cc7cf81):
        $ java -ea Reduced.java
        (results in an exception because of an unexpected result.)

        # To reproduce on JDK19 commit cc7cf81:
        $ java -ea Reduced2.java
        (as above, results in an exception because of an unexpected result.)

        Attachments

          1. Fail.java
            1 kB
          2. FuzzerUtils.java
            13 kB
          3. Reduced.java
            1 kB
          4. Reduced2.java
            1 kB
          5. Test.java
            8 kB

          Issue Links

            Activity

              People

                rcastanedalo Roberto Castaneda Lozano
                chagedorn Christian Hagedorn
                Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: