ml-from-scratch Efficient implementation using C / C++ / CUDA kernels with Python Note: This is purely a learning experiment