Skip to content

Allow using custom index readers and writers #4180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

kaivalnp
Copy link
Contributor

@kaivalnp kaivalnp commented Feb 9, 2025

Description

  • Create custom readers and writers for index IO, which take function pointers as input
  • Also expose these from the C_API

This is helpful for FFI use, where calling processes would pass upcall stubs for streamlined IO

@mdouze
Copy link
Contributor

mdouze commented Feb 13, 2025

Thanks for exposing this functionality in the C api.
I don't understand why a modification of the c++ side of Faiss is required. Could the CustomIOReader and CustomIOWriter be declared inside the C API directory? It is unlikely that they will be useful for other use cases in C++

@kaivalnp
Copy link
Contributor Author

Could the CustomIOReader and CustomIOWriter be declared inside the C API directory?

@mdouze Makes sense, added a commit to move these classes

@kaivalnp
Copy link
Contributor Author

Hi, wanted to check the status here.. Can this PR be merged?

@kaivalnp
Copy link
Contributor Author

Hi, wanted to check the status here.. Can this PR be merged?

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Mar 2, 2025

@bshethmeta Are there any more changes required here? If not, could you help review / merge this PR?

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Mar 4, 2025

@bshethmeta The failure in Build / build-pull-request / Linux x86_64 GPU w/ cuVS (cmake) (pull_request) seems unrelated (and possibly transient):

  ca-certificates-2025.1.31-hbcca054_0.conda extraction failed
  Warning: error    libmamba Error when extracting package: Could not chdir info/files
  
  error    libmamba Error when extracting package: Could not chdir info/files
  Warning: Found incorrect download: ca-certificates. Aborting
  
  Found incorrect download: ca-certificates. Aborting
  Warning:

Could you try running it again?

@kaivalnp
Copy link
Contributor Author

Hi @bshethmeta, any update? I hope to get this merged in v1.11.0

@kaivalnp
Copy link
Contributor Author

Not sure if @bshethmeta is unavailable, @mdouze could you help review / merge this? (they already approved it)

@bshethmeta
Copy link
Contributor

Hi, Are you able to rebase onto latest changes? Once its updated I can try to commit.

- Create custom readers and writers for index IO, which take function pointers as input
- Also expose these from the C_API

This is helpful for FFI use, where calling processes would pass upcall stubs for streamlined IO
@kaivalnp
Copy link
Contributor Author

Are you able to rebase onto latest changes?

Yes, new commits can be cleanly merged (also visible in the GitHub UI)..
I can squash everything into a single commit and force-push to make it cleaner

Also BTW, I think the failure in Build / build-pull-request / Windows x86_64 (conda) (pull_request) is unrelated, as demonstrated by #4233

@kaivalnp
Copy link
Contributor Author

Are you able to rebase onto latest changes? Once its updated I can try to commit.

@bshethmeta I've rebased the changes, can you try now?

@facebook-github-bot
Copy link
Contributor

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kaivalnp
Copy link
Contributor Author

@mengdilin thanks for importing this PR! The build failure seems unrelated, and there was a recent commit which fixes it..
I've updated the branch to include that..

@kaivalnp
Copy link
Contributor Author

Hi, could someone help merge this PR? It is part of a set of changes required to build a Faiss codec for vector searches in Lucene (in addition to #4158 and #4167, see apache/lucene#14178), and would also help other FFI use-cases!

I'd really like to get this into v1.11.0, which seems to be targeted for Q1 2025 according to the release schedule..

Thanks!

@kaivalnp
Copy link
Contributor Author

cc @mengdilin @bshethmeta @mdouze

@mikemccand
Copy link

It would be awesome to wrap FAISS into a Lucene Codec (KnnVectorsFormat -- apache/lucene#14178) so that any Lucene user could quickly switch to FAISS for their KNN search by customizing their index Codec, instead of the builtin default (Java) impl of HNSW that Lucene offers ... mixed KNN/lexical search is a crazy active area for Lucene users and devs right now.

This change would make it relatively simple to access FAISS from Java using Foreign Function Interface (FFI, from Panama) ... no glue code needed (like OpenSearch had to).

@facebook-github-bot
Copy link
Contributor

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 1, 2025

Thanks for importing this PR @mnorris11 !

Looks like Facebook Internal - Builds & Tests has been running for ~16 hours now, is it expected to take this long? (I see it finished in ~5 hours in recent PRs). If it is a transient issue, does it need to be manually restarted?

@mnorris11
Copy link

Thanks for importing this PR @mnorris11 !

Looks like Facebook Internal - Builds & Tests has been running for ~16 hours now, is it expected to take this long? (I see it finished in ~5 hours in recent PRs). If it is a transient issue, does it need to be manually restarted?

We are reviewing it internally, sorry for the delay. Hope to merge today. The signal has passed internally. It seems like it is some UI issue between Github and the internal system.

@facebook-github-bot
Copy link
Contributor

@mnorris11 merged this pull request in a4401c1.

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 2, 2025

Thanks for the help @mnorris11 !

@kaivalnp kaivalnp deleted the custom_io_c branch April 2, 2025 03:05
abhinavdangeti added a commit to blevesearch/faiss that referenced this pull request May 20, 2025
…iss@bleve` (#52)

Merge results:
```
|\
* ca874b6 Abhinav Dangeti | Fix type mismatches within unit test: TEST(TestHamming, test_hamming_knn)
* e255b9b Abhinav Dangeti | Adapt signature change of `get_InvertedListScanner` in faiss/IndexIVFPQ.cpp
* 90fe29b Abhinav Dangeti | Remove redundant cmake install over target `faiss_c`
*   0882dd3 Abhi Dangeti | Merge branch 'bleve' into main_1.11.0
|\
| * 0be294a Deepkaran Salooja | Implement compute_distance_to_codes_for_list and compute_distance_table for IndexIVFPQ (#50)
| * 352484e Rahul Rampure | MB-65473: Batch converter for vector to cluster IDs (#49)
| * 14a4a60 Rahul Rampure | MB-65473: Refactor and Optimize Pre-Filtered Vector Search (#48)
| *   b4cc942 Abhi Dangeti | MB-65243: Merge 'facebookresearch/faiss@v1.10.0' into 'blevesearch/faiss@bleve' (#46)
| |\
| | * 8d33b5c Abhinav Dangeti | MB-65243: Merge 'facebookresearch/faiss@v1.10.0' into 'blevesearch/faiss@bleve'
| |/|
| * | 8eecdb6 Rahul Rampure | MB-63643: Fix missing num_threads clauses (#44)
| * | 224acef Deepkaran Salooja | MB-61093 Fix memory leak for SQDistanceComputer (#43)
| * | 3001b51 Deepkaran Salooja | MB-61093 Add method to compute distance from codes for IVF index (#41)
| * | b747c55 Aditi Ahuja | MB-62230 - Updated closest_centroids API to include params (#38)
| * | 26d9b35 Aditi Ahuja | MB-62230 - Extended c_api to search only specified clusters with params. (#35)
| * | f077bf9 Abhi Dangeti | Build libfaiss with AVX2 support when requested, rather than libfaiss (#37)
| * |   5ab1ce0 Abhi Dangeti | MB-62577: Merge 'facebookresearch/faiss@v1.8.0' into blevesearch/faiss@bleve
| |\ \
| | * | 3306e58 Abhinav Dangeti | MB-62577: Merge 'facebookresearch/faiss@v1.8.0' into blevesearch/faiss@bleve
| |/| |
| * | | d9db66a Rahul Rampure | MB-62221: API to free a buffer allocated in C runtime (#30)
| * | | a2f4183 Rahul Rampure | MB-62221: Fix buffer overflow (#29)
| * | | 7977457 Rahul Rampure | MB-61930: Add a num_threads clause to every openMP pragma. (#25)
| * | | a30eaa2 Rahul Rampure | MB-61930: Optimize Thread Management in High Throughput Scenarios (#24)
| * | | 2ce3883 Thejas-bhat | MB-59575: Revert memcpy optimizations for flat indexes (#23)
| * | | 7c3c7d1 Thejas-bhat | MB-59575: Refactor member variables alignment of IndexFlatCodes (#22)
| * | | 17c3992 Thejas-bhat | MB-59575: Reducing copy overhead of already memory mapped content (#17)
| * | | 38f6b60 Chris Hillery | Fix build on Windows (#21)
| * | | 4143984 SaptarshiSen-CB | MB-61609: Fix zero sa_code_size (#19)
| * | | 7b119f4 Rahul Rampure | MB-60739: Fix integer overflow (#15)
| * | | 6851683 Rahul Rampure | MB-60657: Fix integer overflow (#14)
| * | | 8672bf3 Thejas-bhat | Size API to get the index's size (#13)
| * | | b34ccf6 Aditi Ahuja | MB-60202 - IDMap2 Selector (#12)
| * | | a623ec6 Thejas-bhat | Introducing a new reader to read index using a pointer (#8)
| * | | 4dd26f8 Chris Hillery | Add INSTALL() directive for faiss_c (#7)
| * | | 14fd16a Chris Hillery | Suppress (thousands of) warnings when building with GCC (#6)
| * | | 44febf0 Abhi Dangeti | Address incorrect import within c_api/IndexIVF_c_ex.cpp (#5)
| * | | 1b295e4 Abhi Dangeti | Add build instructions for IndexIVF_c_ex.cpp and Index_c_ex.cpp (#4)
| * | | 334021a Abhi Dangeti | additional index APIs (#3)
| * | | f0bbc06 Abhi Dangeti | Introducing index IO operations over char buffer (#2)
* | | | ea1cdf0 Michael Norris | Increment next release, v1.11.0 (facebookresearch#4308)
* | | | 70c4537 simshi | fix: algorithm of spreading vectors over shards (facebookresearch#4299)
* | | | d4fa401 Michael Norris | Add RaBitQ to the swigfaiss so we can access its properties correctly in python (facebookresearch#4304)
* | | | c75f166 Satyendra Mishra | Add date and time to the codec file path so that the file doesn't get overridden with each run (facebookresearch#4303)
* | | | a3cd63f Aditya Vidyadhar Kamath | Skip mmap test case in AIX. (facebookresearch#4275)
* | | | e36897f Michael Norris | Fix overflow of int32 in IndexNSG (facebookresearch#4297)
* | | | 117aafd Michael Simpson | Fix Type Error in Conditional Logic (facebookresearch#4294)
* | | | 928333c Jim Meyering | faiss/gpu/GpuAutoTune.cpp: fix llvm-19-exposed -Wunused-but-set-variable warnings
* | | | bb04bf6 Bhavik Sheth | Add missing header in faiss/CMakeLists.txt (facebookresearch#4285)
* | | | d9cfd00 Satyendra Mishra | Implement is_spherical and normalize_L2 booleans as part of the training APIs (facebookresearch#4279)
* | | | 915f719 Michael Norris | Fix nightly by pinning conda-build to prevent regression in 25.3.2 (facebookresearch#4287)
* | | | de5e85e generatedunixname89002005287564 | Fix CQS signal. Id] 88153895 -- readability-redundant-string-init in fbcode/faiss (facebookresearch#4283)
* | | | 7eac034 Satyendra Mishra | Add normalize_l2 boolean to distributed training API
* | | | 0dfb599 Jaap Aarts | Handle insufficient driver gracefully (facebookresearch#4271)
* | | | d4e236b Alexandr Guzhva | relax input params for IndexIVFRaBitQ::get_InvertedListScanner() (facebookresearch#4270)
* | | | df9e2c4 Alexandr Guzhva | Fix a placeholder for 'unimplemented' in mapped_io.cpp (facebookresearch#4268)
* | | | 0d3aff9 wwq | fix bug: IVFPQ of raft/cuvs does not require redundant check (facebookresearch#4241)
* | | | a4401c1 Kaival Parikh | Allow using custom index readers and writers (facebookresearch#4180)
* | | | 636d95e Tarang Jain | Upgrade to libcuvs=25.04 (facebookresearch#4164)
* | | | 7f523f0 Junjie Qi | ignore regex (facebookresearch#4264)
* | | | ccc2b33 Alexandr Guzhva | fix a serialization problem in RaBitQ (facebookresearch#4261)
* | | | 13255a8 Kaival Parikh | Publish the C API to Conda (facebookresearch#4186)
* | | | 3a49130 Alexandr Guzhva | RaBitQ implementation (facebookresearch#4235)
* | | | c2fc549 Satyendra Mishra | Pass row filters to Hive Reader to filter rows (facebookresearch#4256)
* | | | 6116d36 Mayank Bhatia | Grammar fix in FlatIndexHNSW (facebookresearch#4253)
* | | | 1debb7d Matthijs Douze | re-land mmap diff (facebookresearch#4250)
* | | | 0f2035c Richard Barnes | Fix CUDA kernel index data type in faiss/gpu/impl/DistanceUtils.cuh +10 (facebookresearch#4246)
* | | | 1dcbb4a Alexandr Guzhva | fix `IVFPQFastScan::RangeSearch()` on the `ARM` architecture (facebookresearch#4247)
* | | | 8bce244 Mengdi Lin | fix integer overflow issue when calculating imbalance_factor (facebookresearch#4245)
* | | | 5adab67 Rohil Shah | Fix bug with metric_arg in IndexHNSW (facebookresearch#4239)
* | | | f2f7a66 Mengdi Lin | Back out "test merge with internal repo" (facebookresearch#4244)
* | | | caa5f24 Junjie Qi | test merge with internal repo (facebookresearch#4242)
* | | | 9e808d4 Richard Barnes | Remove unused exception parameter from faiss/impl/ResultHandler.h (facebookresearch#4243)
* | | | fec7ce9 Gustav von Zitzewitz | SearchParameters support for IndexBinaryFlat (facebookresearch#4055)
* | | | df6a8f6 George Wang | Address compile errors and warnings (facebookresearch#4238)
* | | | 15491a1 Saumya Agarwal | Revert D69972250: Memory-mapping and Zero-copy deserializers
* | | | fbc7db2 Saumya Agarwal | Revert D69984379: mem mapping and zero-copy python fixes
* | | | 631b0fd Matthijs Douze | mem mapping and zero-copy python fixes (facebookresearch#4212)
* | | | 55a3c2a Alexandr Guzhva | Memory-mapping and Zero-copy deserializers (facebookresearch#4199)
* | | | 653be59 Richard Barnes | Use `nullptr` in faiss/gpu/StandardGpuResources.cpp (facebookresearch#4232)
* | | | 3d96ad5 Lucian Grijincu | faiss: fix non-templated hammings function (facebookresearch#4195)
* | | | 4cd2f6e Junjie Qi | Support non-partition col and map in the embedding reader (facebookresearch#4229)
* | | | a22ec32 Junjie Qi | Support cosine distance for training vectors (facebookresearch#4227)
* | | | c109174 Richard Barnes | Fix LLVM-19 compilation issue in faiss/AutoTune.cpp (facebookresearch#4220)
* | | | 615c17e Shuyao Qi | Add missing #include in code_distance-sve.h (facebookresearch#4219)
* | | | eab52af Tom Jackson | Fix cloning and reverse index factory for NSG indices (facebookresearch#4151)
* | | | 1a295cd George Wang | Remove python_abi to fix nightly (facebookresearch#4217)
* | | | 4cea80b Shuyao Qi | Make static method in header inline (facebookresearch#4214)
* | | | 835b3ea Michael Norris | Fix IVF quantizer centroid sharding so IDs are generated (facebookresearch#4197)
* | | | 65222b3 Michael Norris | Pin lief to fix nightly (facebookresearch#4211)
* | | | 7cb4556 lkuffo | Fix Sapphire Rapids never loading in Python bindings (facebookresearch#4209)
* | | | 20c7ca3 Michael Norris | Upgrade openblas to 0.3.29 for ARM architectures (facebookresearch#4203)
* | | | 55d022f George Wang | Attempt to nightly fix (facebookresearch#4204)
* | | | 00ce0e2 Navneet Verma | Add the support for IndexIDMap with Cagra index (facebookresearch#4188)
* | | | 1fe8b8b Nicolas De Carli | Remove unused variable (facebookresearch#4205)
* | | | 6b65289 Divye Gala | Pass `store_dataset` argument along to cuVS CAGRA (facebookresearch#4173)
* | | | d72d0ca Michael Norris | Fix nightly by installing earlier version of lief (facebookresearch#4198)
* | | | 657c563 Bhavik Sheth | Add bounds checking to hnsw nb_neighbors (facebookresearch#4185)
* | | | f0e3832 George Wang | Check for not completed
* | | | aff6bfc Michael Norris | Add sharding convenience function for IVF indexes (facebookresearch#4150)
* | | | 1d8f393 Kaival Parikh | Handle plain SearchParameters in HNSW searches (facebookresearch#4167)
* | | | c6adc01 Michael Norris | Update INSTALL.md to remove some raft references, add missing dependency (facebookresearch#4176)
* | | | 95955d8 Kota Yamaguchi | Fix install error when building avx512_spr variant (facebookresearch#4170)
* | | | d720155 Amir Sadoughi | Update README.md (facebookresearch#4169)
* | | | 9896beb simshi | fix: gpu tests link failure with static lib (facebookresearch#4137)
* | | | 6c04699 Mulugeta Mammo | Fix the order of parameters in bench_scalar_quantizer_distance. (facebookresearch#4159)
* | | | 3ec2fbd Tarang Jain | Update CAGRA docs (facebookresearch#4152)
* | | | 6718dae Kaival Parikh | Expose IDSelectorBitmap in the C_API (facebookresearch#4158)
* | | | 9bc4b67 Jesper Stemann Andersen | Added support for building for MinGW, in addition to MSVC (facebookresearch#4145)
| |_|/
|/| |
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants