Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8370671

C2 SuperWord [x86]: implement Long.max/min reduction for AVX2

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Unresolved
    • Icon: P4 P4
    • tbd
    • 26
    • hotspot

      I noticed that we have the MaxV and MinV implemented for long, but not the reduction. I think it should be possible to allow the reduction.

      It already works for AVX512.

      I found this during work on JDK-8340093, where it would now be considered profitable to vectorize the long min/max reduction.
      See tests in:
      test/hotspot/jtreg/compiler/loopopts/superword/TestReductions.java

      I attached a Reduction.java for demonstration.

      One can see that the element-wise MaxV is vectorized, but not the reduction. But an add-reduction is vectorized, so all the shuffling should be available. That indicates to me that we should be able to do a MaxV reduction.

      Investigate if we have the same issue with the Vector API.

      java -Xbatch -XX:CompileCommand=compileonly,Reduction::test* -XX:CompileCommand=printcompilation,Reduction::test* -XX:+TraceNewVectors -XX:UseAVX=2 -XX:CompileCommand=TraceAutoVectorization,Reduction::test*,SW_REJECTIONS Reduction.java

      [empeter@emanuel bin]$ ./java -Xbatch -XX:CompileCommand=compileonly,Reduction::test* -XX:CompileCommand=printcompilation,Reduction::test* -XX:+TraceNewVectors -XX:UseAVX=2 -XX:CompileCommand=TraceAutoVectorization,Reduction::test*,SW_REJECTIONS Reduction.java
      CompileCommand: compileonly Reduction.test* bool compileonly = true
      CompileCommand: PrintCompilation Reduction.test* bool PrintCompilation = true
      CompileCommand: TraceAutoVectorization Reduction.test* const char* TraceAutoVectorization = 'SW_REJECTIONS'
      4018 97 % b 3 Reduction::test1 @ 4 (26 bytes)
      4020 98 b 3 Reduction::test1 (26 bytes)
      4021 99 % b 4 Reduction::test1 @ 4 (26 bytes)

      SuperWord::transform_loop:
          Loop: N562/N162 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
       562 CountedLoop === 562 275 162 [[ 557 561 562 271 565 566 476 236 ]] inner stride: 4 main of N562 strip mined multiversion_fast !orig=[473],[276],[245],[223] !jvms: Reduction::test1 @ bci:13 (line 18)

      WARNING: Removed pack: not implemented at any smaller size:
          0: 547 MaxL === _ 565 548 [[ 544 ]] !orig=463,225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          1: 544 MaxL === _ 547 545 [[ 463 ]] !orig=225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          2: 463 MaxL === _ 544 464 [[ 225 ]] !orig=225,199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          3: 225 MaxL === _ 463 226 [[ 277 565 383 ]] !orig=199 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)

      WARNING: Removed pack: not profitable:
          0: 548 LoadL === 394 7 549 [[ 547 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=464,226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
          1: 545 LoadL === 394 7 546 [[ 544 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
          2: 464 LoadL === 394 7 465 [[ 463 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=226,188 !jvms: Reduction::test1 @ bci:13 (line 18)
          3: 226 LoadL === 394 7 227 [[ 225 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; #long !orig=188 !jvms: Reduction::test1 @ bci:13 (line 18)

      SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
      4034 100 b 4 Reduction::test1 (26 bytes)

      SuperWord::transform_loop:
          Loop: N471/N170 limit_check counted [int,int),+4 (10243 iters) main has_sfpt strip_mined
       471 CountedLoop === 471 186 170 [[ 471 182 474 477 ]] inner stride: 4 main of N471 strip mined !orig=[400],[187],[178],[116] !jvms: Reduction::test1 @ bci:10 (line 18)

      WARNING: Removed pack: not implemented at any smaller size:
          0: 461 MaxL === _ 474 462 [[ 460 ]] !orig=395,157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          1: 460 MaxL === _ 461 464 [[ 395 ]] !orig=157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          2: 395 MaxL === _ 460 396 [[ 157 ]] !orig=157,411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)
          3: 157 MaxL === _ 395 225 [[ 474 188 332 ]] !orig=411 !jvms: Long::max @ bci:2 (line 1984) Reduction::test1 @ bci:14 (line 18)

      WARNING: Removed pack: not profitable:
          0: 462 LoadL === 352 7 463 [[ 461 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=396,225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
          1: 464 LoadL === 352 7 465 [[ 460 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
          2: 396 LoadL === 352 7 397 [[ 395 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=225,[146] !jvms: Reduction::test1 @ bci:13 (line 18)
          3: 225 LoadL === 352 7 144 [[ 157 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=5; #long (does not depend only on test, unknown control) !orig=[146] !jvms: Reduction::test1 @ bci:13 (line 18)

      SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize
      4153 101 % b 3 Reduction::test2 @ 2 (25 bytes)
      4154 102 b 3 Reduction::test2 (25 bytes)
      4155 103 % b 4 Reduction::test2 @ 2 (25 bytes)

      SuperWord::transform_loop:
          Loop: N591/N152 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
       591 CountedLoop === 591 295 152 [[ 571 574 585 590 591 291 594 595 489 503 255 242 ]] inner stride: 4 main of N591 strip mined multiversion_fast !orig=[500],[296],[263],[239] !jvms: Reduction::test2 @ bci:12 (line 25)
      TraceNewVectors [AutoVectorization]: 639 Replicate === _ 23 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 640 LoadVector === 421 595 577 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 641 MaxV === _ 640 639 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 642 StoreVector === 591 595 577 641 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6;

      SuperWord::transform_loop: success
      4173 104 b 4 Reduction::test2 (25 bytes)

      SuperWord::transform_loop:
          Loop: N505/N188 limit_check counted [int,int),+4 (10243 iters) main has_sfpt strip_mined
       505 CountedLoop === 505 213 188 [[ 492 495 505 508 509 426 209 175 ]] inner stride: 4 main of N505 strip mined !orig=[432],[214],[205],[113] !jvms: Reduction::test2 @ bci:8 (line 25)
      TraceNewVectors [AutoVectorization]: 574 Replicate === _ 143 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 575 LoadVector === 383 509 498 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory<J,4> (does not depend only on test, unknown control)
      TraceNewVectors [AutoVectorization]: 576 MaxV === _ 575 574 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 577 StoreVector === 505 509 498 576 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5;

      SuperWord::transform_loop: success
      4234 105 % b 3 Reduction::test3 @ 4 (24 bytes)
      4235 106 b 3 Reduction::test3 (24 bytes)
      4237 107 % b 4 Reduction::test3 @ 4 (24 bytes)

      SuperWord::transform_loop:
          Loop: N551/N162 limit_check counted [int,int),+4 (10243 iters) main multiversion_fast has_sfpt strip_mined
       551 CountedLoop === 551 264 162 [[ 546 550 551 260 554 555 465 225 ]] inner stride: 4 main of N551 strip mined multiversion_fast !orig=[462],[265],[234],[212] !jvms: Reduction::test3 @ bci:13 (line 32)
      TraceNewVectors [AutoVectorization]: 599 LoadVector === 383 7 538 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 600 Replicate === _ 387 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 601 AddVL === _ 554 599 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 602 AddReductionVL === _ 345 601 [[ ]] no_strict_order

      SuperWord::transform_loop: success
      4260 108 b 4 Reduction::test3 (24 bytes)

      SuperWord::transform_loop:
          Loop: N461/N160 limit_check counted [int,int),+4 (10243 iters) main has_sfpt strip_mined
       461 CountedLoop === 461 176 160 [[ 461 172 464 467 ]] inner stride: 4 main of N461 strip mined !orig=[390],[177],[168],[116] !jvms: Reduction::test3 @ bci:10 (line 32)
      TraceNewVectors [AutoVectorization]: 531 LoadVector === 342 7 453 [[ ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory<J,4> (does not depend only on test, unknown control)
      TraceNewVectors [AutoVectorization]: 532 Replicate === _ 22 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 533 AddVL === _ 464 531 [[ ]] #vectory<J,4>
      TraceNewVectors [AutoVectorization]: 534 AddReductionVL === _ 301 533 [[ ]] no_strict_order

      SuperWord::transform_loop: success

            epeter Emanuel Peter
            epeter Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: