Matrix multiplication is among the most fundamental and compute-intensive operations in machine learning.