Large datasets

Hi, I am trying to run the code on a large dataset, however I face a memory issue when generating the similarity matrix. Since the algorithme create a graph node for each feature vector, the size of the similarity matrix is n*n, where n is the number of feature vectors. How do you suggest to overcome the problem and run the code on a dataset of millions of samples ? thanks in advance.