Skip to content

Large datasets #5

@RoufaidaLaidi

Description

@RoufaidaLaidi

Hi, I am trying to run the code on a large dataset, however I face a memory issue when generating the similarity matrix. Since the algorithme create a graph node for each feature vector, the size of the similarity matrix is n*n, where n is the number of feature vectors. How do you suggest to overcome the problem and run the code on a dataset of millions of samples ? thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions