This python implementation is fast on mac os, but slow on Linux. It is because scipy.sparse.linalg.svds is used. Will fix it in the future.