C2 SuperWord and Vector API: vector algorithms test and benchmark

XMLWordPrintable

      This is an exploratory task.

      There are many "algorithms" that can benefit from vectorization:
      - fill
      - populate_index (aka iota)
      - reduce (sum, min, and, ... even polynomial reduction = hash)
      - map
      - copy
      - reverse (copy with shuffle)
      - find (predicate based on single element)
      - scan (prefix sum, min scan, ...)
      - filter (using compress)
      - segmented scans (list of strings -> list of hashes)
      - run-length encoding and decoding (replicate)

      In the Linear Algebra / Machine Learning area, there may be:
      - matrix multiplication
      - sparse matrix multiplication
      - softmax

      [~jrose] Has some ideas here:
      https://cr.openjdk.org/~jrose/pres/202202-VectorTopics.pdf

      Here some links on Replicate:
      https://www.dyalog.com/blog/2018/06/expanding-bits-in-shrinking-time/
      https://mlochbaum.github.io/BQN/implementation/primitive/replicate.html#boolean-compress

      [~vlivanov] Mentioned Daniel Lemire's work, there are lots of ideas here:
      http://0x80.pl/notesen.html
      https://lemire.me/blog/

      [~pminborg] Suggested compare for mismatch and simple pattern matching. What I could imagine:
      - Case-insensitive string compare
      - match all or any, possibly with and without early exit

      [~pminborg] Also mentioned sort, which I had forgot to mention earlier. We can do:
      - quicksort or radix sort. They use filter/compress with multiple passes.
      - Per also suggested sorting in a block first. I'd need to think about how to create a larger sort from that.
      Per mentioned this link: https://tweedegolf.nl/en/blog/79/sorting-with-simd

      There are also lots of parsing tasks that can be vectorized.

      The goal is to implement many of these, to test them reasonably, check for vectorization with IR tests, and also create JMH benchmarks.

      We can compare the performance of different implementations, and also compare with scalar and auto vectorized performance.

            Assignee:
            Emanuel Peter
            Reporter:
            Emanuel Peter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: