Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8216228 | 13 | Vivek Deshpande | P4 | Resolved | Fixed | team |
It can vectorize this operation in the loop:
out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1]));
This patch is useful for AI ML/DL applications such as convolution based Neural Nets.
More information on VNNI can be found here:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
Code contributed by: razvan.a.lupusoru@intel.com and vdeshpande(vivek.r.deshpande@intel.com)
The initial performance gains with micro on skylake with AVX3 is 10.8x.
and it generates
vmovdqu xmm3, xmmword ptr [rbp+r8*2+0x10]
vmovdqu xmm6, xmmword ptr [rdx+r8*2+0x10]
vpmaddwd xmm3, xmm6, xmm3
vpaddd xmm3, xmm3, xmmword ptr [r9+rdi*4+0x10]
vmovdqu xmmword ptr [r9+rdi*4+0x10], xmm3
It can generate vpdpwssd instruction on cascadelake.
The webrev is here:
http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.00/
- backported by
-
JDK-8216228 X86: Support for VNNI Instructions
-
- Resolved
-
- relates to
-
JDK-8215353 x86_32 build failures after JDK-8214751 (X86: Support for VNNI Instructions)
-
- Resolved
-
-
JDK-8215891 X86: Support for VNNI byte Instruction VPDPBUSD
-
- In Progress
-
-
JDK-8239549 AArch64: Backend support for MulAddVS2VI node
-
- Resolved
-
-
JDK-8219151 Illegal instruction exception on JDK 12 due to incorrect CPU feature bits
-
- Resolved
-
-
JDK-8236701 [TESTBUG] compiler/loopopts/superword/Vec_MulAddS2I.java uses wrong flag -XX:-SuperWord
-
- Resolved
-
-
JDK-8216050 Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index
-
- Closed
-
-
JDK-8229694 JVM crash in SWPointer during C2 OSR compilation
-
- Closed
-
-
JDK-8230185 assert(is_Loop()) failed: invalid node class
-
- Closed
-
-
JDK-8230078 compiler/loopopts/superword/Vec_MulAddS2I.java is unexpectedly slow in windows
-
- Closed
-
-
JDK-8216580 Fix generation of VNNI vector code by allowing adjacent LoadS nodes to be isomorphic
-
- Resolved
-