-
Sub-task
-
Resolution: Unresolved
-
P4
-
24
-
riscv
Comparison of Superword optimization between x86 and riscv.
From the data below, we can see:
1. There is some regression in riscv with the optimization turned on, which does not exist on x86.
2. Improvement is lower on riscv compared with x86
## tested on K230-CanMV
x86
Benchmark-no-SLP Units Score Benchmark-SLP Score non-SLP/SLP (unit: ns/op)
VectorReduction.NoSuperword.andRedI ns/op 332.146 VectorReduction.WithSuperword.andRedI 54.547 6.089170807
VectorReduction.NoSuperword.andRedIOnGlobalAccumulator ns/op 477.499 VectorReduction.WithSuperword.andRedIOnGlobalAccumulator 52.93 9.021330059
VectorReduction.NoSuperword.andRedIPartiallyUnrolled ns/op 505.152 VectorReduction.WithSuperword.andRedIPartiallyUnrolled 54.453 9.276844251
VectorReduction.NoSuperword.andRedL ns/op 405.894 VectorReduction.WithSuperword.andRedL 94.195 4.309082223
VectorReduction.NoSuperword.mulRedD ns/op 415.98 VectorReduction.WithSuperword.mulRedD 399.011 1.04252765
VectorReduction.NoSuperword.orRedI ns/op 316.542 VectorReduction.WithSuperword.orRedI 53.397 5.928085848
VectorReduction.NoSuperword.orRedL ns/op 385.08 VectorReduction.WithSuperword.orRedL 97.691 3.941816544
VectorReduction.NoSuperword.prodRedD ns/op 384.908 VectorReduction.WithSuperword.prodRedD 377.523 1.019561722
VectorReduction.NoSuperword.prodRedDSimple ns/op 363.297 VectorReduction.WithSuperword.prodRedDSimple 363.321 0.9999339427
VectorReduction.NoSuperword.prodRedF ns/op 385.404 VectorReduction.WithSuperword.prodRedF 373.774 1.031115059
VectorReduction.NoSuperword.prodRedFSimple ns/op 363.207 VectorReduction.WithSuperword.prodRedFSimple 363.261 0.9998513466
VectorReduction.NoSuperword.xorRedI ns/op 308.601 VectorReduction.WithSuperword.xorRedI 53.624 5.75490452
VectorReduction.NoSuperword.xorRedL ns/op 384.97 VectorReduction.WithSuperword.xorRedL 264.686 1.454440356
riscv jmh-base-with-rvv.log
Benchmark-noSLP Units Score Benchmark-SLP Score non-SLP/SLP (unit: ns/op)
VectorReduction.NoSuperword.andRedI ns/op 3665.333 VectorReduction.WithSuperword.andRedI 2621.189 1.398347468
VectorReduction.NoSuperword.andRedIOnGlobalAccumulator ns/op 3766.025 VectorReduction.WithSuperword.andRedIOnGlobalAccumulator 2627.826 1.43313332
VectorReduction.NoSuperword.andRedIPartiallyUnrolled ns/op 3972.161 VectorReduction.WithSuperword.andRedIPartiallyUnrolled 2654.868 1.496180224
VectorReduction.NoSuperword.andRedL ns/op 2816.14 VectorReduction.WithSuperword.andRedL 2820.604 0.9984173603
VectorReduction.NoSuperword.mulRedD ns/op 3148.575 VectorReduction.WithSuperword.mulRedD 6497.298 0.4845975973
VectorReduction.NoSuperword.orRedI ns/op 3646.042 VectorReduction.WithSuperword.orRedI 2620.487 1.391360461
VectorReduction.NoSuperword.orRedL ns/op 2823.715 VectorReduction.WithSuperword.orRedL 2821.324 1.000847474
VectorReduction.NoSuperword.prodRedD ns/op 1832.914 VectorReduction.WithSuperword.prodRedD 1832.928 0.9999923619
VectorReduction.NoSuperword.prodRedDSimple ns/op 1691.322 VectorReduction.WithSuperword.prodRedDSimple 1690.487 1.000493941
VectorReduction.NoSuperword.prodRedF ns/op 1713.991 VectorReduction.WithSuperword.prodRedF 1719.252 0.9969399483
VectorReduction.NoSuperword.prodRedFSimple ns/op 1376.185 VectorReduction.WithSuperword.prodRedFSimple 1365.547 1.007790285
VectorReduction.NoSuperword.xorRedI ns/op 3744.248 VectorReduction.WithSuperword.xorRedI 2619.867 1.42917484
VectorReduction.NoSuperword.xorRedL ns/op 2814.807 VectorReduction.WithSuperword.xorRedL 2817.866 0.9989144267
From the data below, we can see:
1. There is some regression in riscv with the optimization turned on, which does not exist on x86.
2. Improvement is lower on riscv compared with x86
## tested on K230-CanMV
x86
Benchmark-no-SLP Units Score Benchmark-SLP Score non-SLP/SLP (unit: ns/op)
VectorReduction.NoSuperword.andRedI ns/op 332.146 VectorReduction.WithSuperword.andRedI 54.547 6.089170807
VectorReduction.NoSuperword.andRedIOnGlobalAccumulator ns/op 477.499 VectorReduction.WithSuperword.andRedIOnGlobalAccumulator 52.93 9.021330059
VectorReduction.NoSuperword.andRedIPartiallyUnrolled ns/op 505.152 VectorReduction.WithSuperword.andRedIPartiallyUnrolled 54.453 9.276844251
VectorReduction.NoSuperword.andRedL ns/op 405.894 VectorReduction.WithSuperword.andRedL 94.195 4.309082223
VectorReduction.NoSuperword.mulRedD ns/op 415.98 VectorReduction.WithSuperword.mulRedD 399.011 1.04252765
VectorReduction.NoSuperword.orRedI ns/op 316.542 VectorReduction.WithSuperword.orRedI 53.397 5.928085848
VectorReduction.NoSuperword.orRedL ns/op 385.08 VectorReduction.WithSuperword.orRedL 97.691 3.941816544
VectorReduction.NoSuperword.prodRedD ns/op 384.908 VectorReduction.WithSuperword.prodRedD 377.523 1.019561722
VectorReduction.NoSuperword.prodRedDSimple ns/op 363.297 VectorReduction.WithSuperword.prodRedDSimple 363.321 0.9999339427
VectorReduction.NoSuperword.prodRedF ns/op 385.404 VectorReduction.WithSuperword.prodRedF 373.774 1.031115059
VectorReduction.NoSuperword.prodRedFSimple ns/op 363.207 VectorReduction.WithSuperword.prodRedFSimple 363.261 0.9998513466
VectorReduction.NoSuperword.xorRedI ns/op 308.601 VectorReduction.WithSuperword.xorRedI 53.624 5.75490452
VectorReduction.NoSuperword.xorRedL ns/op 384.97 VectorReduction.WithSuperword.xorRedL 264.686 1.454440356
riscv jmh-base-with-rvv.log
Benchmark-noSLP Units Score Benchmark-SLP Score non-SLP/SLP (unit: ns/op)
VectorReduction.NoSuperword.andRedI ns/op 3665.333 VectorReduction.WithSuperword.andRedI 2621.189 1.398347468
VectorReduction.NoSuperword.andRedIOnGlobalAccumulator ns/op 3766.025 VectorReduction.WithSuperword.andRedIOnGlobalAccumulator 2627.826 1.43313332
VectorReduction.NoSuperword.andRedIPartiallyUnrolled ns/op 3972.161 VectorReduction.WithSuperword.andRedIPartiallyUnrolled 2654.868 1.496180224
VectorReduction.NoSuperword.andRedL ns/op 2816.14 VectorReduction.WithSuperword.andRedL 2820.604 0.9984173603
VectorReduction.NoSuperword.mulRedD ns/op 3148.575 VectorReduction.WithSuperword.mulRedD 6497.298 0.4845975973
VectorReduction.NoSuperword.orRedI ns/op 3646.042 VectorReduction.WithSuperword.orRedI 2620.487 1.391360461
VectorReduction.NoSuperword.orRedL ns/op 2823.715 VectorReduction.WithSuperword.orRedL 2821.324 1.000847474
VectorReduction.NoSuperword.prodRedD ns/op 1832.914 VectorReduction.WithSuperword.prodRedD 1832.928 0.9999923619
VectorReduction.NoSuperword.prodRedDSimple ns/op 1691.322 VectorReduction.WithSuperword.prodRedDSimple 1690.487 1.000493941
VectorReduction.NoSuperword.prodRedF ns/op 1713.991 VectorReduction.WithSuperword.prodRedF 1719.252 0.9969399483
VectorReduction.NoSuperword.prodRedFSimple ns/op 1376.185 VectorReduction.WithSuperword.prodRedFSimple 1365.547 1.007790285
VectorReduction.NoSuperword.xorRedI ns/op 3744.248 VectorReduction.WithSuperword.xorRedI 2619.867 1.42917484
VectorReduction.NoSuperword.xorRedL ns/op 2814.807 VectorReduction.WithSuperword.xorRedL 2817.866 0.9989144267