- 
    Enhancement 
- 
    Resolution: Fixed
- 
     P4 P4
- 
    24
- 
        b07
Goal
------
Replace old VPointer with a new version that relies on MemPointer - which then is a shared utility for both MergeStores and SuperWord / AutoVectorization. MemPointer generally parses pointers, and VPointer specializes this facility for the use in loops (VLoop).
The old VPointer implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in MemPointer is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants.
Future: with the MemPointer implementation of VPointer, it should be easier to implement speculative runtime-checks for Aliasing-Analysis
Details
---------
This looks like a rather big patch, so let me explain the parts.
Refactor of MemPointer in mepointer.hpp/cpp:
- Added concept of Base to MemPointer. This is required for the aliasing computation in VPointer.
- sub_expression_has_native_base_candidate: add special case to parse through CastX2P if we find a native memory base MemorySegment.address(), i.e. jdk.internal.foreign.NativeMemorySegmentImpl.min. This helps some native memory segment cases to vectorize that did not before.
- So far MemPointer could only answer adjacency queries. But VPointer also needs overlap queries, see the old VPointer::not_equal (i.e. can we prove that the two VPointer never overlap?). So I had to add a new case to aliasing computation: NotOrAtDistance. It is useful to answer the new and better named MemPointer::never_overlaps_with.
- Collapsed together MemPointerDecomposedForm and MemPointer. It was an unnecessary and unhelpful split.
Re-write of VPointer based on MemPointer:
- Old pattern:
- VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]
- VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv] -> adr = CastX2P, the addition of 11 Param, 1819 Phi and 629 LoadL.
- New pattern:
- VPointer[size: 4, object, base(110 CastPP) + con( 16) + iv_scale( 4) * iv + invar(0)]
- VPointer[size: 1, native, base(629 LoadL) + con( 0) + iv_scale( 1) * iv + invar(1 * [11 Parm] + 1 * [1819 Phi])] -> parse through the CastX2P, and detect the more helpful base 629 LoadL (MemorySegment.address()), take 11 Param and 1819 Phi as invariants.
- Move invariant check to VLoop::is_pre_loop_invariant.
- Re-work the aliasing computation, and give it better names:
- Before, it was done via VPointer::cmp, which returned quite cryptic return codes. This allowed us to do adjacency and overlap queries. Now I have explicit queries is_adjacent_to_and_before and never_overlaps_with.
- VPointer::make_with_size and VPointer::make_with_iv_offset: one can now create a copy of a VPointer with a changed size (e.g. when going from smaller scalar to larger vector), or with a shifted offset (when simulating a VPointer in the next iteration). Now we can do aliasing computations with such modified VPointers, which is quite powerful and I will probably use for other things in the future.
Refactor many uses of VPointer:
- since we now do not have a single invar, but rather a set of invar_summands, I had to change some code in AlignmentSolution / AlignmentSolver, VMemoryRegion, create_adjacent_memop_pairs and VTransform::adjust_pre_loop_limit_to_align_main_loop_vectors. This unfortunately makes the change quite large, but a lot of it is just renaming of variables (e.g. scale -> iv_scale to avoid confusion with scales of other summands).
- VTransformMemVectorNode: it now carries the VPointer corresponding to the vector load/store, adjusted for its size. We can now do aliasing checks with these nodes. This is how I refactored away overlap_possible_with_any_in, and replaced it with a single call to never_overlaps_with.
Fix existing tests: more cases now vectorize:
- Non-power-of-2 stride: the issue used to be that e.g. stride=3 would make IGVN split 3 * iv -> iv + iv << 1. This was not properly parsed by the old VPointer, as it found the iv twice. But the MemPointerParser simply collects all usages and adds them up again, back to 3 * iv.
- Some cases in TestMemorySegment.java now vectorize, because VPointer is better at detecting the invariants and detecting that they are equivalent.
Add new tests:
- A while ago, I had worked on:
Future work:
-----------------
JDK-8331576: Investigate case where native memory does not vectorize because of CastX2P.
Testing
----------
I ran tier1-4, stress testing, and performance testing. Looks all good, no significant regression detected.
- blocks
- 
                    JDK-8323582 C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory -           
- Resolved
 
-         
- 
                    JDK-8324751 C2 SuperWord: Aliasing Analysis runtime check -           
- Resolved
 
-         
- duplicates
- 
                    JDK-8330274 C2 SuperWord: VPointer invar: same sum with different addition order should be equal -           
- Closed
 
-         
- 
                    JDK-8330991 C2 SuperWord: refactor VPointer -           
- Closed
 
-         
- relates to
- 
                    JDK-8348263 C2 SuperWord: TestMemorySegment.java has failing IR rules with AlignVector after JDK-8343685 -           
- Resolved
 
-         
- 
                    JDK-8348657 compiler/loopopts/superword/TestEquivalentInvariants.java timed out -           
- Resolved
 
-         
- 
                    JDK-8331576 C2 SuperWord: Unsafe access with long address that is a CastX2P does not vectorize -           
- Open
 
-         
- 
                    JDK-8335392 C2 MergeStores: enhanced pointer parsing -           
- Resolved
 
-         
- links to
- 
                     Commit(master)
        openjdk/jdk/0f426e4a Commit(master)
        openjdk/jdk/0f426e4a
- 
                     Review(master)
        openjdk/jdk/21926 Review(master)
        openjdk/jdk/21926