Skip to content

[CLN] Clean up configuration docs for spann, add max on search_nprobe #5076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jairad26
Copy link
Contributor

@jairad26 jairad26 commented Jul 10, 2025

Description of changes

This PR updates the docs for spann configuration to reflect the default values, and ranges supported by the system. It also adds a max value on search_nprobe that was not previously defined.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

@jairad26 jairad26 marked this pull request as ready for review July 10, 2025 23:31
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor

propel-code-bot bot commented Jul 10, 2025

Clean Up SPANN Configuration Documentation and Enforce Max on search_nprobe

This PR refines the documentation around SPANN index configuration by clarifying default values, min/max/range constraints, and replaces previously imprecise or missing details. It also updates the Rust InternalSpannConfiguration struct to enforce a maximum value of 128 for search_nprobe via validator, ensuring the code and documentation are aligned on allowed parameter bounds.

Key Changes

• SPANN configuration documentation was updated to list explicit default, min, and max values for all relevant parameters.
• Details and explanations for several configuration fields (like search_nprobe, split_threshold, merge_threshold) were clarified or added.
• A #[validate(range(max = 128))] attribute was added to search_nprobe in rust/types/src/spann_configuration.rs to enforce an upper bound at the schema/validation level.

Affected Areas

• User documentation for SPANN configuration (docs/docs.trychroma.com/markdoc/content/docs/collections/configure.md)
• SPANN configuration struct and validation (rust/types/src/spann_configuration.rs)

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/fix-config-docs branch from d46fc27 to a57b529 Compare July 10, 2025 23:37
* `reassign_neighbor_count` is the number of closest neighboring clusters of a split cluster whose points are considered for reassignment. The default value is 64.
* `search_nprobe` is the number of centers that are probed for a query. The higher the value the more accurate the result will be. The query response time also increases as `search_nprobe` increases. Recommended values are 64/128. The default value is 64, and the max value is 128.
* `write_nprobe` is the same as `search_nprobe` but for the index construction phase. It is the number of centers searched when appending or reassigning a point. The default value is 64, and the max value is 128.
* `ef_construction` determines the size of the candidate list used to select neighbors during index creation. A higher value improves index quality at the cost of more memory and time, while a lower value speeds up construction with reduced accuracy. The default value is 200, and the max value is 200.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mention that it is for the hnsw index

@jairad26 jairad26 force-pushed the jai/fix-config-docs branch from a57b529 to 3796145 Compare July 11, 2025 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants