Loading...

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 21
Affects Version/s: 11, 17, 21
Component/s: hotspot
Labels:

Subcomponent:
compiler
Resolved In Build:
b22

During work of ~~JDK-8304042~~ (and ultimately of ~~JDK-8298935~~), I found an example where we have no cyclic dependency, but the C2 graph after SuperWord is not correct. It leads to WRONG RESULTS.

The issue is that we do not re-order the graph for non-vectorized memory operations. But the vectorization of some memory operations may lead to a reordering of memory ops, not just for the vectorized ones but also the unvectorized ones.

In SuperWord::co_locate_pack we have some complicated logic for Stores, where we try to find out what needs to be scheduled before (and presumably after). But we have nothing equivalent for Load.

We could try to implement something equivalent to the logic for Stores. But I am not convinced that this works, because we might have to also re-order stores that are not packed.

My suggestion:
We schedule all packs and left over scalars into a linear order (I already do that in ~~JDK-8304042~~). From that order, we can re-construct all memory-state edges in the C2 memory slices.

Reproduce like this:

./java -XX:-TieredCompilation -Xbatch -XX:CompileCommand=compileonly,Test4::test -XX:CompileCommand=printcompilation,Test4::* -XX:LoopUnrollLimit=250 -XX:+TraceSuperWord Test4.java

./java -XX:-TieredCompilation -Xbatch --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:CompileCommand=compileonly,Test5::test -XX:CompileCommand=printcompilation,Test5::* -XX:LoopUnrollLimit=250 -XX:+TraceSuperWord -XX:LoopMaxUnroll=5 Test5.java

I will describe Test4.java. Test5.java is very similar, just uses Unsafe so that it reproduces before we implemented conversions in SuperWord.

We have:

        for (int i = 0; i < RANGE; i+=2) {
            dataFa[i + 0] = dataIa[i + 0] * 1.3f;
            dataIb[i + 0] = (int)dataFb[i + 0] * 11;
            dataIb[i + 1] = (int)dataFb[i + 1] * 11;
            dataFa[i + 1] = dataIa[i + 1] * 1.2f;
        }

This gets unrolled, for simplicity let's say only 2x.

        for (int i = 0; i < RANGE; i+=4) {
            dataFa[i + 0] = dataIa[i + 0] * 1.3f; -> scalar, before
            dataIb[i + 0] = (int)dataFb[i + 0] * 11; -> vectorize
            dataIb[i + 1] = (int)dataFb[i + 1] * 11; -> vectorize
            dataFa[i + 1] = dataIa[i + 1] * 1.2f; -> scalar, after
            dataFa[i + 2] = dataIa[i + 2] * 1.3f; -> scalar, before
            dataIb[i + 2] = (int)dataFb[i + 2] * 11; -> vectorize
            dataIb[i + 3] = (int)dataFb[i + 3] * 11; -> vectorize
            dataFa[i + 3] = dataIa[i + 3] * 1.2f; -> scalar, after
        }

We can only vectorize half of the statements, the other half does not work because the constants are not matching.

To enable this vectorization, the statements have to be re-ordered.
We have to pay attention to the Store-Load dependencies.
For (i+0) and (i+2), we have a StoreF-LoadF dependency ("before"), for (i+1) and (i+3) we have a StoreI-LoadI dependency ("after").

        for (int i = 0; i < RANGE; i+=4) {
            dataFa[i + 0] = dataIa[i + 0] * 1.3f; -> scalar, before
            dataFa[i + 2] = dataIa[i + 2] * 1.3f; -> scalar, before
            dataIb[i + 0] = (int)dataFb[i + 0] * 11; -> vectorize
            dataIb[i + 1] = (int)dataFb[i + 1] * 11; -> vectorize
            dataIb[i + 2] = (int)dataFb[i + 2] * 11; -> vectorize
            dataIb[i + 3] = (int)dataFb[i + 3] * 11; -> vectorize
            dataFa[i + 1] = dataIa[i + 1] * 1.2f; -> scalar, after
            dataFa[i + 3] = dataIa[i + 3] * 1.2f; -> scalar, after
        }

We see that we had to re-order the memory ops. For example we had to swap the order of the StoreF for (i+1) and (i+2). This must be reflected in the C2 graph. Some of the scalar memory ops must be moved before the vector load/stores, some must be pushed to later. This should be determined by the dependency-graph of SuperWord.

I think my Test5.java fails for JDK17.

Task:
Un-comment "test5" from ~~JDK-8304042~~.
And lift the platform restrictions at the test definitions.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Test4.java
2 kB
2023-03-22 07:36
Test5.java
2 kB
2023-03-22 07:36

relates to

JDK-8306302 C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D

Resolved

JDK-8306088 C2 Superword: verify consistency of packset and my_pack

Closed

JDK-8304042 C2 SuperWord: schedule must remove packs with cyclic dependencies

Resolved

links to

Commit openjdk/jdk/ad0e5a99

Review openjdk/jdk/13354

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates