This is something I should have thought of from the beginning, but was reminded again by this article:
https://netflixtechblog.com/optimizing-recommendation-systems-with-jdks-vector-api-30d2830401ec
They speed up a double dot-product with vector accumulator and fma.
https://netflixtechblog.com/optimizing-recommendation-systems-with-jdks-vector-api-30d2830401ec
They speed up a double dot-product with vector accumulator and fma.