Details
-
Enhancement
-
Resolution: Unresolved
-
P4
-
22
Description
I would like to track a list of smaller SuperWord RFE's:
Improvements:
JDK-8317572: C2 SuperWord: refactor/improve VectorizeDebugOption and TraceSuperWord
JDK-8309908: C2 SuperWord: IGVN commute swap_edges can prevent vectorization
JDK-8309267: C2 SuperWord: some tests fail on KNL machines - fail to vectorize
JDK-8309183: [IR Framework] Add UseKNLSetting to whitelist
JDK-8308841: C2 SuperWord: implement vectorization of integer CMove
JDK-8307516: C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
JDK-8303113: [SuperWord] investigate if enabling _do_vector_loop by default creates speedup
JDK-8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)
JDK-8305707: SuperWord should vectorize reverse-order reduction loops
JDK-8305717: SuperWord: Vectorization in opposite direction traversal cases
JDK-8299808: ArrayFill should be preferred over unrolling
JDK-8302652: [SuperWord] Reduction should happen after loop, when possible
JDK-8308606: C2 SuperWord: remove alignment checks when not required
JDK-8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit
JDK-8260943: C2 SuperWord: Remove dead vectorization optimization added by 8076284
JDK-8318703: C2 SuperWord: take reduction nodes into account in early unrolling analysis
JDK-8332878: C2 SuperWord: improve PopulateIndex detection for L/F/D
JDK-8307084: C2: Vector atomic post loop is not executed for some small trip counts
(Found by ARM, I hope they take this one up soon!)
JDK-8336000: Long::bitCount does not auto-vectorize on AArch64
(actually reports issue with 2-element reductions, they are marked as not protitable in SuperWord::implemented, must be re-evaluated)
JDK-8334431: Regression 18-20% on Mac x64 on Crypto.signverify
(need to cost-model penalty for failed store-to-load-forwarding)
https://www.elastic.co/search-labs/blog/articles/Vector%20Similarity%20Computations%20-%20ludicrous%20speed
Can we do this with auto-vectorization?
JDK-8325155: C2 SuperWord: remove alignment boundaries
JDK-8325541: C2 SuperWord: refactor filter / split
JDK-8326139: C2 SuperWord: split packs (match use/def packs, implemented, mutual independence)
JDK-8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph
JDK-8328678: C2: hand unrolled loops don't vectorize/unroll as well as loops unrolled by the compiler
JDK-8330991: C2 SuperWord: refactor VPointer
JDK-8331576: C2 SuperWord: Unsafe access with long address that is a CastX2P does not vectorize
Improvements relevant for MemorySegment:
JDK-8329273: C2 SuperWord: some basic MemorySegment IR tests
JDK-8327209: C2 MemorySegment: missing RCE and vectorization
JDK-8324751: C2 SuperWord: Aliasing Analysis
JDK-8329077: C2: MemorySegment double accesses don't vectorize
JDK-8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal
JDK-8331659: C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java
Cleanup:
JDK-8309204: Obsolete DoReserveCopyInSuperWord
JDK-8323577 C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055
JDK-8325159: C2 SuperWord: measure time for CITime
Tests:
JDK-8310891: C2 SuperWord tests: move platform requirements to IR rules
JDK-8310523: Add IR tests for nodes that have too few IR tests yet
JDK-8327671: C2 SuperWord: move all tests to test/hotspot/jtreg/compiler/autovectorization
JDK-8333647: C2 SuperWord: some additional PopulateIndex tests
IR Framework:
JDK-8310308: IR Framework: check for type and size of vector nodes
JDK-8320224: IR Framework: add MaxVectorSize to JTREG_WHITELIST_FLAGS
JDK-8310533: [IR Framework] Add possibility to automatically verify that a test method always returns the same result
Bugs:
JDK-8332905: C2 SuperWord: bad AD file, with RotateRightV and first operand not a pack
JDK-8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop
JDK-8323582: C2 SuperWord AlignVector: misaligned vector memory access with Unsafe.allocateMemory
JDK-8316679: C2 SuperWord: wrong result, load should not be moved before store if not comparable
JDK-8316594: C2 SuperWord: wrong result with hand unrolled loops
JDK-8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs
(JDK-8311586, JDK-8309662, JDK-8303827)
JDK-8314612: TestUnorderedReduction.java fails with -XX:MaxVectorSize=32 and -XX:+AlignVector
JDK-8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally
JDK-8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D
JDK-8298935: fix independence bug in create_pack logic in SuperWord::find_adjacent_refs
JDK-8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction
JDK-8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302
JDK-8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph
JDK-8304042: C2 SuperWord: schedule must remove packs with cyclic dependencies
Improvements:
JDK-8309908: C2 SuperWord: IGVN commute swap_edges can prevent vectorization
JDK-8309183: [IR Framework] Add UseKNLSetting to whitelist
JDK-8308841: C2 SuperWord: implement vectorization of integer CMove
JDK-8307516: C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction
JDK-8303113: [SuperWord] investigate if enabling _do_vector_loop by default creates speedup
JDK-8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)
JDK-8305707: SuperWord should vectorize reverse-order reduction loops
JDK-8305717: SuperWord: Vectorization in opposite direction traversal cases
JDK-8299808: ArrayFill should be preferred over unrolling
JDK-8332878: C2 SuperWord: improve PopulateIndex detection for L/F/D
JDK-8307084: C2: Vector atomic post loop is not executed for some small trip counts
(Found by ARM, I hope they take this one up soon!)
JDK-8336000: Long::bitCount does not auto-vectorize on AArch64
(actually reports issue with 2-element reductions, they are marked as not protitable in SuperWord::implemented, must be re-evaluated)
JDK-8334431: Regression 18-20% on Mac x64 on Crypto.signverify
(need to cost-model penalty for failed store-to-load-forwarding)
https://www.elastic.co/search-labs/blog/articles/Vector%20Similarity%20Computations%20-%20ludicrous%20speed
Can we do this with auto-vectorization?
JDK-8328678: C2: hand unrolled loops don't vectorize/unroll as well as loops unrolled by the compiler
JDK-8330991: C2 SuperWord: refactor VPointer
JDK-8331576: C2 SuperWord: Unsafe access with long address that is a CastX2P does not vectorize
Improvements relevant for MemorySegment:
JDK-8327209: C2 MemorySegment: missing RCE and vectorization
JDK-8324751: C2 SuperWord: Aliasing Analysis
JDK-8329077: C2: MemorySegment double accesses don't vectorize
JDK-8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal
JDK-8331659: C2 SuperWord: investicate failed vectorization in compiler/loopopts/superword/TestMemorySegment.java
Cleanup:
Tests:
JDK-8310891: C2 SuperWord tests: move platform requirements to IR rules
JDK-8310523: Add IR tests for nodes that have too few IR tests yet
JDK-8327671: C2 SuperWord: move all tests to test/hotspot/jtreg/compiler/autovectorization
IR Framework:
JDK-8320224: IR Framework: add MaxVectorSize to JTREG_WHITELIST_FLAGS
JDK-8310533: [IR Framework] Add possibility to automatically verify that a test method always returns the same result
Bugs:
JDK-8323582: C2 SuperWord AlignVector: misaligned vector memory access with Unsafe.allocateMemory
(
Attachments
Issue Links
- relates to
-
JDK-8325497 Investigate C2 issues identified by the "JVM Performance Comparison for JDK 21"
- Open