- 
    Type:
Enhancement
 - 
    Resolution: Unresolved
 - 
    Priority:
  P4                     
     - 
    Affects Version/s: 26
 - 
    Component/s: hotspot
 
- 
        x86
 
                    I noticed that we have the MulV element-wise vector implemented, but not the reduction. I think it should be possible to allow the reduction.
It already works for AVX512dq. But it should also work for AVX2, AVX1, and maybe even SSE4.1
I found this during work on JDK-8340093..
See tests in:
test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java
I attached a Reduction2.java for demonstration.
One can see that the element-wise MulL is vectorized, but not the reduction. But an add-reduction is vectorized, so all the shuffling should be available. That indicates to me that we should be able to do a MulL reduction.
Investigate if we have the same issue with the Vector API.
[empeter@emanuel bin]$ ./java -Xbatch -XX:CompileCommand=compileonly,Reduction2::test* -XX:CompileCommand=printcompilation,Reduction2::test* -XX:+TraceNewVectors -XX:UseAVX=1 -XX:CompileCommand=TraceAutoVectorization,Reduction2::test*,SW_REJECTIONS Reduction2.java
CompileCommand: compileonly Reduction2.test* bool compileonly = true
CompileCommand: PrintCompilation Reduction2.test* bool PrintCompilation = true
CompileCommand: TraceAutoVectorization Reduction2.test* const char* TraceAutoVectorization = 'SW_REJECTIONS'
4088 98 % b 3 Reduction2::test1 @ 4 (24 bytes)
4090 99 b 3 Reduction2::test1 (24 bytes)
4091 100 % b 4 Reduction2::test1 @ 4 (24 bytes)
SuperWord::transform_loop:
Loop: N551/N162 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
551 CountedLoop === 551 264 162 [[ 546 550 551 260 554 555 465 225 ]] inner stride: 4 main of N551 strip mined multiversion_fast !orig=[462],[265],[234],[212] !jvms: Reduction2::test1 @ bci:13 (line 18)
WARNING: Removed pack: not implemented at any smaller size:
0: 536 MulL === _ 554 537 [[ 533 ]] !orig=452,214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
1: 533 MulL === _ 536 534 [[ 452 ]] !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
2: 452 MulL === _ 533 453 [[ 214 ]] !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
3: 214 MulL === _ 452 215 [[ 266 554 372 ]] !orig=189 !jvms: Reduction2::test1 @ bci:14 (line 18)
WARNING: Removed pack: not profitable:
0: 453 LoadL === 383 7 454 [[ 452 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
1: 215 LoadL === 383 7 216 [[ 214 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=188 !jvms: Reduction2::test1 @ bci:13 (line 18)
WARNING: Removed pack: not profitable:
0: 537 LoadL === 383 7 538 [[ 536 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=453,215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
1: 534 LoadL === 383 7 535 [[ 533 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
4102 101 b 4 Reduction2::test1 (24 bytes)
4111 102 % b 3 Reduction2::test2 @ 2 (25 bytes)
4113 103 b 3 Reduction2::test2 (25 bytes)
4114 104 % b 4 Reduction2::test2 @ 2 (25 bytes)
SuperWord::transform_loop:
Loop: N582/N152 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
582 CountedLoop === 582 285 152 [[ 562 566 576 581 582 281 585 586 480 494 245 232 ]] inner stride: 4 main of N582 strip mined multiversion_fast !orig=[491],[286],[253],[229] !jvms: Reduction2::test2 @ bci:12 (line 25)
TraceNewVectors [AutoVectorization]: 630 Replicate === _ 180 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 631 Replicate === _ 180 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 632 LoadVector === 411 586 569 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 633 MulVL === _ 632 631 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 634 StoreVector === 582 586 569 633 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
TraceNewVectors [AutoVectorization]: 635 LoadVector === 411 586 488 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 636 MulVL === _ 635 630 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 637 StoreVector === 582 634 488 636 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
SuperWord::transform_loop: success
4132 105 b 4 Reduction2::test2 (25 bytes)
SuperWord::transform_loop:
Loop: N495/N178 limit_check counted [int,int),+4 (10243 iters) main has_sfpt strip_mined
495 CountedLoop === 495 203 178 [[ 482 485 495 416 500 501 199 165 ]] inner stride: 4 main of N495 strip mined !orig=[422],[204],[195],[113] !jvms: Reduction2::test2 @ bci:8 (line 25)
TraceNewVectors [AutoVectorization]: 564 Replicate === _ 143 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 565 Replicate === _ 143 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 566 LoadVector === 373 501 419 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 567 MulVL === _ 566 564 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 568 StoreVector === 495 501 419 567 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;
TraceNewVectors [AutoVectorization]: 569 LoadVector === 373 501 488 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 570 MulVL === _ 569 565 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 571 StoreVector === 495 568 488 570 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;
SuperWord::transform_loop: success
4242 106 % b 3 Reduction2::test3 @ 4 (24 bytes)
4243 107 b 3 Reduction2::test3 (24 bytes)
4244 108 % b 4 Reduction2::test3 @ 4 (24 bytes)
SuperWord::transform_loop:
Loop: N625/N162 limit_check counted [int,int),+8 (10243 iters) main multiversion_fast has_sfpt strip_mined
625 CountedLoop === 625 264 162 [[ 614 617 620 624 625 626 627 546 550 225 260 465 ]] inner stride: 8 main of N625 strip mined multiversion_fast !orig=[551],[462],[265],[234],[212] !jvms: Reduction2::test3 @ bci:13 (line 32)
TraceNewVectors [AutoVectorization]: 675 LoadVector === 383 7 606 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 676 LoadVector === 383 7 454 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 677 LoadVector === 383 7 596 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 678 LoadVector === 383 7 538 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 679 Replicate === _ 387 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 680 AddVL === _ 627 675 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 681 AddVL === _ 680 677 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 682 AddVL === _ 681 678 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 683 AddVL === _ 682 676 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 684 AddReductionVL === _ 345 683 [[ ]] no_strict_order
SuperWord::transform_loop: success
4258 109 b 4 Reduction2::test3 (24 bytes)
SuperWord::transform_loop:
Loop: N666/N160 limit_check counted [int,int),+16 (10243 iters) main has_sfpt strip_mined
666 CountedLoop === 666 176 160 [[ 666 172 669 678 ]] inner stride: 16 main of N666 strip mined !orig=[552],[461],[390],[177],[168],[116] !jvms: Reduction2::test3 @ bci:10 (line 32)
TraceNewVectors [AutoVectorization]: 779 LoadVector === 342 7 453 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 780 LoadVector === 342 7 642 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 781 LoadVector === 342 7 533 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 782 LoadVector === 342 7 636 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 783 LoadVector === 342 7 537 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 784 LoadVector === 342 7 656 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 785 LoadVector === 342 7 387 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 786 LoadVector === 342 7 650 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 787 Replicate === _ 22 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 788 AddVL === _ 669 782 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 789 AddVL === _ 788 784 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 790 AddVL === _ 789 786 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 791 AddVL === _ 790 780 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 792 AddVL === _ 791 781 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 793 AddVL === _ 792 783 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 794 AddVL === _ 793 779 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 795 AddVL === _ 794 785 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 796 AddReductionVL === _ 301 795 [[ ]] no_strict_order
SuperWord::transform_loop: success
            
It already works for AVX512dq. But it should also work for AVX2, AVX1, and maybe even SSE4.1
I found this during work on JDK-8340093..
See tests in:
test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java
I attached a Reduction2.java for demonstration.
One can see that the element-wise MulL is vectorized, but not the reduction. But an add-reduction is vectorized, so all the shuffling should be available. That indicates to me that we should be able to do a MulL reduction.
Investigate if we have the same issue with the Vector API.
[empeter@emanuel bin]$ ./java -Xbatch -XX:CompileCommand=compileonly,Reduction2::test* -XX:CompileCommand=printcompilation,Reduction2::test* -XX:+TraceNewVectors -XX:UseAVX=1 -XX:CompileCommand=TraceAutoVectorization,Reduction2::test*,SW_REJECTIONS Reduction2.java
CompileCommand: compileonly Reduction2.test* bool compileonly = true
CompileCommand: PrintCompilation Reduction2.test* bool PrintCompilation = true
CompileCommand: TraceAutoVectorization Reduction2.test* const char* TraceAutoVectorization = 'SW_REJECTIONS'
4088 98 % b 3 Reduction2::test1 @ 4 (24 bytes)
4090 99 b 3 Reduction2::test1 (24 bytes)
4091 100 % b 4 Reduction2::test1 @ 4 (24 bytes)
SuperWord::transform_loop:
Loop: N551/N162 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
551 CountedLoop === 551 264 162 [[ 546 550 551 260 554 555 465 225 ]] inner stride: 4 main of N551 strip mined multiversion_fast !orig=[462],[265],[234],[212] !jvms: Reduction2::test1 @ bci:13 (line 18)
WARNING: Removed pack: not implemented at any smaller size:
0: 536 MulL === _ 554 537 [[ 533 ]] !orig=452,214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
1: 533 MulL === _ 536 534 [[ 452 ]] !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
2: 452 MulL === _ 533 453 [[ 214 ]] !orig=214,189 !jvms: Reduction2::test1 @ bci:14 (line 18)
3: 214 MulL === _ 452 215 [[ 266 554 372 ]] !orig=189 !jvms: Reduction2::test1 @ bci:14 (line 18)
WARNING: Removed pack: not profitable:
0: 453 LoadL === 383 7 454 [[ 452 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
1: 215 LoadL === 383 7 216 [[ 214 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=188 !jvms: Reduction2::test1 @ bci:13 (line 18)
WARNING: Removed pack: not profitable:
0: 537 LoadL === 383 7 538 [[ 536 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=453,215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
1: 534 LoadL === 383 7 535 [[ 533 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=215,188 !jvms: Reduction2::test1 @ bci:13 (line 18)
SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
4102 101 b 4 Reduction2::test1 (24 bytes)
4111 102 % b 3 Reduction2::test2 @ 2 (25 bytes)
4113 103 b 3 Reduction2::test2 (25 bytes)
4114 104 % b 4 Reduction2::test2 @ 2 (25 bytes)
SuperWord::transform_loop:
Loop: N582/N152 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
582 CountedLoop === 582 285 152 [[ 562 566 576 581 582 281 585 586 480 494 245 232 ]] inner stride: 4 main of N582 strip mined multiversion_fast !orig=[491],[286],[253],[229] !jvms: Reduction2::test2 @ bci:12 (line 25)
TraceNewVectors [AutoVectorization]: 630 Replicate === _ 180 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 631 Replicate === _ 180 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 632 LoadVector === 411 586 569 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 633 MulVL === _ 632 631 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 634 StoreVector === 582 586 569 633 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
TraceNewVectors [AutoVectorization]: 635 LoadVector === 411 586 488 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 636 MulVL === _ 635 630 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 637 StoreVector === 582 634 488 636 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;
SuperWord::transform_loop: success
4132 105 b 4 Reduction2::test2 (25 bytes)
SuperWord::transform_loop:
Loop: N495/N178 limit_check counted [int,int),+4 (10243 iters) main has_sfpt strip_mined
495 CountedLoop === 495 203 178 [[ 482 485 495 416 500 501 199 165 ]] inner stride: 4 main of N495 strip mined !orig=[422],[204],[195],[113] !jvms: Reduction2::test2 @ bci:8 (line 25)
TraceNewVectors [AutoVectorization]: 564 Replicate === _ 143 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 565 Replicate === _ 143 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 566 LoadVector === 373 501 419 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 567 MulVL === _ 566 564 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 568 StoreVector === 495 501 419 567 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;
TraceNewVectors [AutoVectorization]: 569 LoadVector === 373 501 488 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 570 MulVL === _ 569 565 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 571 StoreVector === 495 568 488 570 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;
SuperWord::transform_loop: success
4242 106 % b 3 Reduction2::test3 @ 4 (24 bytes)
4243 107 b 3 Reduction2::test3 (24 bytes)
4244 108 % b 4 Reduction2::test3 @ 4 (24 bytes)
SuperWord::transform_loop:
Loop: N625/N162 limit_check counted [int,int),+8 (10243 iters) main multiversion_fast has_sfpt strip_mined
625 CountedLoop === 625 264 162 [[ 614 617 620 624 625 626 627 546 550 225 260 465 ]] inner stride: 8 main of N625 strip mined multiversion_fast !orig=[551],[462],[265],[234],[212] !jvms: Reduction2::test3 @ bci:13 (line 32)
TraceNewVectors [AutoVectorization]: 675 LoadVector === 383 7 606 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 676 LoadVector === 383 7 454 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 677 LoadVector === 383 7 596 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 678 LoadVector === 383 7 538 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 679 Replicate === _ 387 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 680 AddVL === _ 627 675 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 681 AddVL === _ 680 677 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 682 AddVL === _ 681 678 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 683 AddVL === _ 682 676 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 684 AddReductionVL === _ 345 683 [[ ]] no_strict_order
SuperWord::transform_loop: success
4258 109 b 4 Reduction2::test3 (24 bytes)
SuperWord::transform_loop:
Loop: N666/N160 limit_check counted [int,int),+16 (10243 iters) main has_sfpt strip_mined
666 CountedLoop === 666 176 160 [[ 666 172 669 678 ]] inner stride: 16 main of N666 strip mined !orig=[552],[461],[390],[177],[168],[116] !jvms: Reduction2::test3 @ bci:10 (line 32)
TraceNewVectors [AutoVectorization]: 779 LoadVector === 342 7 453 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 780 LoadVector === 342 7 642 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 781 LoadVector === 342 7 533 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 782 LoadVector === 342 7 636 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 783 LoadVector === 342 7 537 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 784 LoadVector === 342 7 656 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 785 LoadVector === 342 7 387 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 786 LoadVector === 342 7 650 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx<J,2> (does not depend only on test, unknown control)
TraceNewVectors [AutoVectorization]: 787 Replicate === _ 22 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 788 AddVL === _ 669 782 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 789 AddVL === _ 788 784 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 790 AddVL === _ 789 786 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 791 AddVL === _ 790 780 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 792 AddVL === _ 791 781 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 793 AddVL === _ 792 783 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 794 AddVL === _ 793 779 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 795 AddVL === _ 794 785 [[ ]] #vectorx<J,2>
TraceNewVectors [AutoVectorization]: 796 AddReductionVL === _ 301 795 [[ ]] no_strict_order
SuperWord::transform_loop: success
- relates to
 - 
                    
JDK-8370677 AArch64: C2 SuperWord: implement sequential reduction for add/mul D/F
-         
     - Open
 
 -         
 - 
                    
JDK-8340093 C2 SuperWord: implement cost model
-         
     - Open
 
 -         
 - 
                    
JDK-8370671 C2 SuperWord [x86]: implement Long.max/min reduction for AVX2
-         
     - Open
 
 -