Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8300669

AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic



    • Enhancement
    • Status: Open
    • P4
    • Resolution: Unresolved
    • 21
    • tbd
    • hotspot
    • aarch64
    • generic


      Experiments show that fill() implementations suffer from having many branches. That can be improved with following ideas:

      1. Wider stores in the main case (aligned and long enough target).

      Current implementation uses STP for the main loop. We can start using SIMD stores (https://bugs.openjdk.org/browse/JDK-8268233) but it is more effective than wide GPR stores only on some micro-architectures, GPR variant can be wider as wll. Choice of main store variant can be made by a run time flag. Experiments show that current SVE implementations show no benefit over SIMD stores but if that changes it makes sense to have an implementation that can be further extended with SVE store variant.

      2. Align wider stores to a larger boundary.

      3. Table based implementation for alignment and processing tails.

      As fill() implementations are generated a stubs, for small enough lengths there may be sub-routines that fill arrays exactly of that length without branches. Trade-off here is additional code wasted for such duplicated implementations. However experiments show that it is moderate.


        Issue Links



              dchuyko Dmitry Chuyko
              dchuyko Dmitry Chuyko
              0 Vote for this issue
              4 Start watching this issue