Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8311911

C2 Superword does not honor AVX3Threshold

XMLWordPrintable

      - On legacy Xeons (Skylake, Cascade Lake) we saw significant performance degradation with AVX512 instruction due to frequency level switchover penalty, reduced core frequency and hysteresis effect which causes subsequent non-AVX512 instruction stream to execute at a lower frequency.
      - Problem became severe in workloads where we see spikes of AVX512 instructions in otherwise AVX2 instruction trace.
      - Some of the X86 stubs (array copy / fills) do take AVX3Thresholds into consideration but not all.

      There are multiple approaches to address this problem.
      1) Multi versioning of loops guarded by AVX3Threshold, but this may result into code bloating given that we already split the iteration space into
      pre-main-post (atomic vector)-vector_tail- scalar tail.

      2) We can use loop profiling information collected by C1 and interpreter to dynamically adjust per loop VectorSize.

      This pass can also take other factors like instruction costs and target feature availability to dynamically adjust per loop vector size, may address some of the issues discovered on KNL (https://bugs.openjdk.org/browse/JDK-8309267)

      FTR, from X86 standpoint it is not a pressing issue since latest Xeons are past frequency problems with AVX512.

            jbhateja Jatin Bhateja
            jbhateja Jatin Bhateja
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: