You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 6, 2024. It is now read-only.
This project generally seems quite helpful. Honestly, I'm most interested in the clustering, we are fairly happy with deduplication system as is. It seems like for this to work as is you need enough memory to hold all your vectors at once. Then, from there, can run the alrogithm.
Most of our customer vector datasets are >80GB in size so we would need some way to cluster them in a paginated method. It would be cool to contribute that, but I wanted to see if there was maybe already an issue for it or something adjacent?