Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 16
Affects Version/s: 16
Component/s: hotspot
Labels:
- c2

Subcomponent:
compiler
Resolved In Build:
b28
CPU:

generic
OS:

linux

Currently, for all CPUs, if matching RotateLeftV and RotateRightV with rules in AD files, one has to implement both immediate and variable versions.

On aarch64, with match rules for vector rotation, immediate vector rotatation can be optimized with shift+insert instructions (i.e. SLI/SRI, ~23% improvements with an initial implementation).
However there woule be performance regression for variable version, due to SLI/SRI have no register version in NEON intruction set and there is no register version for right shift neither.
The instructions for match rules of vector rotate variable should be:
    # this is the performance regression, loop invairables can't be extracted outside a loop.
    mov w9, 32
    dup v13.4s, w9
    sub v20.4s, v13.4S, v19.4s
    ----------------------------
    sshl v17.4s, v16.4s, v19.4s
    neg v18.16b, v20.16b # on aarch64, vector right shift is implemented as left shift by negative shift count
    ushl v16.4s, v16.4s, v18.4s
    orr v16.16b, v17.16b, v16.16b

The immediate vector rotation should be splitted from RotateLeftV and RotateRightV nodes, so that they can be matched and optimized alone on CPUs like aarch64.

links to

Commit openjdk/jdk/026b09cf

Review openjdk/jdk/1532

Assignee:: Dong Bo

Reporter:: Dong Bo

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2020-11-30 22:23

Updated:: 2024-12-20 11:56

Resolved:: 2020-12-10 04:26

Details

Description

Attachments

Issue Links

Activity

People

Dates