Feature: LeaNN indexing

### Describe what you are looking for

You probably read about LeaNN,  an experiment by Matei Zaharia's (CTO Databricks) at UC Berkeley. If not, here is a quick summary and I think it could be interesting for usearch as well (see the last paragraph):

Repo: https://github.com/yichuan-w/LEANN
Paper: https://arxiv.org/pdf/2506.08276

They say that normal embeddings, which are needed for RAG, consume a ton of space (which is true) + are expensive to build (also true). Thus, normal embeddings can today only be used on powerful servers, not edge devices - phones, tablets etc.

LeaNN are two techniques for vector search with 95% less storage space than HNSW. According to the paper, the recall loss is negligible and the trade-off is much better than with normal quantization of the original vectors (brute force scan) or vectors in the HNSW index.

*Technique 1 (at a high-level):*
- They first compute embeddings for all indexed documents. This is a one-time step and expensive (might work best for systems with mostly static data).
- Then they build a regular HNSW graph on top of the embeddings. Also a one-time step and slow.
- Then they throw away the original embeddings and the embeddings in the HNSW graph. What remains are the graph connections.
- HNSW traversal works by visiting only a very small subset of the entire graph (depending on the candidate list size aka. parameter `ef_search`). So a search starts at a random point, and the graph connections tell them the neighbor nodes but since the embeddings are gone, it is impossible to calculate distances.
- The embeddings of the neighbor nodes are re-generated on-the-fly during graph traversal. Since this affects only few nodes, it is fine. A fast GPU still helps (they tested also with Mac M1 without GPU).

*Technique 2:* 
They say that the HNSW graph connections still consumed in their experiments 10%-30% storage space of the embeddings. They wanted to cut them down further.
So the other technique is an aggressive pruning strategy which retains only "hub nodes" aka. important nodes, spanning wide areas of the vector space, all details are in the paper.

In their evaluation, LeaNN is very compact (not surprisingly) but still has a good recall.

Techniques 1 and two are actually independent of each other and orthogonal.
- Technique 1 could be implemented by a new overload for `search` with a function pointer for a routine that calculates embeddings. It will also need an option to remove vectors altogether (a config option similar to existing `enable_key_lookups = false`).
- Technique 2 could be implemented as is, it should reduce index size by half without loosing much recall.

### Can you contribute to the implementation?

- [ ] I can contribute

### Is your feature request specific to a certain interface?

It applies to everything

### Contact Details

robert@clickhouse.com

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: LeaNN indexing #662

Describe what you are looking for

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: LeaNN indexing #662

Description

Describe what you are looking for

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

Contact Details

Is there an existing issue for this?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions