Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8257483

C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Fixed
    • Icon: P4 P4
    • 16
    • 16
    • hotspot
    • b28
    • generic
    • linux

      Currently, for all CPUs, if matching RotateLeftV and RotateRightV with rules in AD files, one has to implement both immediate and variable versions.

      On aarch64, with match rules for vector rotation, immediate vector rotatation can be optimized with shift+insert instructions (i.e. SLI/SRI, ~23% improvements with an initial implementation).
      However there woule be performance regression for variable version, due to SLI/SRI have no register version in NEON intruction set and there is no register version for right shift neither.
      The instructions for match rules of vector rotate variable should be:
          # this is the performance regression, loop invairables can't be extracted outside a loop.
          mov w9, 32
          dup v13.4s, w9
          sub v20.4s, v13.4S, v19.4s
          ----------------------------
          sshl v17.4s, v16.4s, v19.4s
          neg v18.16b, v20.16b # on aarch64, vector right shift is implemented as left shift by negative shift count
          ushl v16.4s, v16.4s, v18.4s
          orr v16.16b, v17.16b, v16.16b

      The immediate vector rotation should be splitted from RotateLeftV and RotateRightV nodes, so that they can be matched and optimized alone on CPUs like aarch64.

            dongbo Dong Bo
            dongbo Dong Bo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: