Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
linear-algebra mpi cuda scalapack matrix-multiplication gpu-acceleration rocm matmul communication-optimal pdgemm
-
Updated
May 8, 2025 - C++