Skip to content

Feature: LeaNN indexing #662

@rschu1ze

Description

@rschu1ze

Describe what you are looking for

You probably read about LeaNN, an experiment by Matei Zaharia's (CTO Databricks) at UC Berkeley. If not, here is a quick summary and I think it could be interesting for usearch as well (see the last paragraph):

Repo: https://github.com/yichuan-w/LEANN
Paper: https://arxiv.org/pdf/2506.08276

They say that normal embeddings, which are needed for RAG, consume a ton of space (which is true) + are expensive to build (also true). Thus, normal embeddings can today only be used on powerful servers, not edge devices - phones, tablets etc.

LeaNN are two techniques for vector search with 95% less storage space than HNSW. According to the paper, the recall loss is negligible and the trade-off is much better than with normal quantization of the original vectors (brute force scan) or vectors in the HNSW index.

Technique 1 (at a high-level):

  • They first compute embeddings for all indexed documents. This is a one-time step and expensive (might work best for systems with mostly static data).
  • Then they build a regular HNSW graph on top of the embeddings. Also a one-time step and slow.
  • Then they throw away the original embeddings and the embeddings in the HNSW graph. What remains are the graph connections.
  • HNSW traversal works by visiting only a very small subset of the entire graph (depending on the candidate list size aka. parameter ef_search). So a search starts at a random point, and the graph connections tell them the neighbor nodes but since the embeddings are gone, it is impossible to calculate distances.
  • The embeddings of the neighbor nodes are re-generated on-the-fly during graph traversal. Since this affects only few nodes, it is fine. A fast GPU still helps (they tested also with Mac M1 without GPU).

Technique 2:
They say that the HNSW graph connections still consumed in their experiments 10%-30% storage space of the embeddings. They wanted to cut them down further.
So the other technique is an aggressive pruning strategy which retains only "hub nodes" aka. important nodes, spanning wide areas of the vector space, all details are in the paper.

In their evaluation, LeaNN is very compact (not surprisingly) but still has a good recall.

Techniques 1 and two are actually independent of each other and orthogonal.

  • Technique 1 could be implemented by a new overload for search with a function pointer for a routine that calculates embeddings. It will also need an option to remove vectors altogether (a config option similar to existing enable_key_lookups = false).
  • Technique 2 could be implemented as is, it should reduce index size by half without loosing much recall.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

robert@clickhouse.com

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions