-
Notifications
You must be signed in to change notification settings - Fork 225
Description
Describe what you are looking for
You probably read about LeaNN, an experiment by Matei Zaharia's (CTO Databricks) at UC Berkeley. If not, here is a quick summary and I think it could be interesting for usearch as well (see the last paragraph):
Repo: https://github.com/yichuan-w/LEANN
Paper: https://arxiv.org/pdf/2506.08276
They say that normal embeddings, which are needed for RAG, consume a ton of space (which is true) + are expensive to build (also true). Thus, normal embeddings can today only be used on powerful servers, not edge devices - phones, tablets etc.
LeaNN are two techniques for vector search with 95% less storage space than HNSW. According to the paper, the recall loss is negligible and the trade-off is much better than with normal quantization of the original vectors (brute force scan) or vectors in the HNSW index.
Technique 1 (at a high-level):
- They first compute embeddings for all indexed documents. This is a one-time step and expensive (might work best for systems with mostly static data).
- Then they build a regular HNSW graph on top of the embeddings. Also a one-time step and slow.
- Then they throw away the original embeddings and the embeddings in the HNSW graph. What remains are the graph connections.
- HNSW traversal works by visiting only a very small subset of the entire graph (depending on the candidate list size aka. parameter
ef_search
). So a search starts at a random point, and the graph connections tell them the neighbor nodes but since the embeddings are gone, it is impossible to calculate distances. - The embeddings of the neighbor nodes are re-generated on-the-fly during graph traversal. Since this affects only few nodes, it is fine. A fast GPU still helps (they tested also with Mac M1 without GPU).
Technique 2:
They say that the HNSW graph connections still consumed in their experiments 10%-30% storage space of the embeddings. They wanted to cut them down further.
So the other technique is an aggressive pruning strategy which retains only "hub nodes" aka. important nodes, spanning wide areas of the vector space, all details are in the paper.
In their evaluation, LeaNN is very compact (not surprisingly) but still has a good recall.
Techniques 1 and two are actually independent of each other and orthogonal.
- Technique 1 could be implemented by a new overload for
search
with a function pointer for a routine that calculates embeddings. It will also need an option to remove vectors altogether (a config option similar to existingenable_key_lookups = false
). - Technique 2 could be implemented as is, it should reduce index size by half without loosing much recall.
Can you contribute to the implementation?
- I can contribute
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct