Reported by yang.zhang@linaro.org:
> >> Currently, vector MLA/MLS instructions in aarch64 port have been enabled.
> >> But when I investigate the status of NEON support, I find that vector
> >> MLA instructions aren’t generated in a single multiply-add operation.
> >>
> >> Take the following function as an example:
> >> public static int vectMulAdd(
> >> int[] a,
> >> int[] b,
> >> int[] c,
> >> int[] d) {
> >> int total = 0;
> >> for (int i = 0; i < LENGTH; i++) {
> >> d[i] = (int)(a[i] * b[i] + c[i]);
> >> total += d[i];
> >> }
> >> return total;
> >> }
> >>
> >> The following code snippet is produced by C2:
> >> 0x0000007f98af88fc: ldr q18, [x19,#32]
> >> 0x0000007f98af8900: ldr q17, [x4,#32]
> >> 0x0000007f98af8904: ldr q19, [x20,#32]
> >> 0x0000007f98af8908: mul v17.4s, v17.4s, v18.4s
> >> 0x0000007f98af890c: add v17.4s, v17.4s, v19.4s
> >>
> >> It can be further optimized into:
> >> 0x0000007f843485e0: ldr q18, [x19,#16]
> >> 0x0000007f843485e4: ldr q17, [x20,#16]
> >> 0x0000007f843485e8: ldr q16, [x4,#16]
> >> 0x0000007f843485ec: mla v18.4s, v16.4s, v17.4s
Yang have a patch for this which is under JTreg test.
> >> Currently, vector MLA/MLS instructions in aarch64 port have been enabled.
> >> But when I investigate the status of NEON support, I find that vector
> >> MLA instructions aren’t generated in a single multiply-add operation.
> >>
> >> Take the following function as an example:
> >> public static int vectMulAdd(
> >> int[] a,
> >> int[] b,
> >> int[] c,
> >> int[] d) {
> >> int total = 0;
> >> for (int i = 0; i < LENGTH; i++) {
> >> d[i] = (int)(a[i] * b[i] + c[i]);
> >> total += d[i];
> >> }
> >> return total;
> >> }
> >>
> >> The following code snippet is produced by C2:
> >> 0x0000007f98af88fc: ldr q18, [x19,#32]
> >> 0x0000007f98af8900: ldr q17, [x4,#32]
> >> 0x0000007f98af8904: ldr q19, [x20,#32]
> >> 0x0000007f98af8908: mul v17.4s, v17.4s, v18.4s
> >> 0x0000007f98af890c: add v17.4s, v17.4s, v19.4s
> >>
> >> It can be further optimized into:
> >> 0x0000007f843485e0: ldr q18, [x19,#16]
> >> 0x0000007f843485e4: ldr q17, [x20,#16]
> >> 0x0000007f843485e8: ldr q16, [x4,#16]
> >> 0x0000007f843485ec: mla v18.4s, v16.4s, v17.4s
Yang have a patch for this which is under JTreg test.