-
Enhancement
-
Resolution: Fixed
-
P5
-
16
-
Nothing special here.
The following things need to be done:
* The Power10+ pextd "optimization" needs to be removed because although it has fewer instructions, it causes a performance hit on actual hardware
* Use a constant block to initialize vector registers. This will reduce the overhead of each call to the intrinsic
* Introduce the xxpermx instruction so that it can be used in a Power10+ optimized decodeBlock table-based lookup
* Implement the xxpermx table-based lookup for Power10+
* The Power10+ pextd "optimization" needs to be removed because although it has fewer instructions, it causes a performance hit on actual hardware
* Use a constant block to initialize vector registers. This will reduce the overhead of each call to the intrinsic
* Introduce the xxpermx instruction so that it can be used in a Power10+ optimized decodeBlock table-based lookup
* Implement the xxpermx table-based lookup for Power10+