🎉 RAGLite v1.0: DuckDB, Qwen3, parallel insertion, benchmarking, better retrieval quality
Release Highlights
- 🐤 support for DuckDB (#137)
- 🐻 support for Qwen3 (#124)
- ⚡️ parallel document insertion (#150)
- 🏁 benchmarking with
raglite bench
(#150) - 🎯 better retrieval quality with improved multi-vector search, chunk quality, and chunk front matter (#123, #126, #132)
- 💎 new and improved query adapter algorithm (#146, #147, #149)
What's Changed
- fix: don't convert markdown to markdown by @joachim-Heirbrant-SL in #116
- fix: fix chunking of single-sentence chunks by @emilradix in #115
- fix: incorporate headings and prevent windowing chunks by @emilradix in #117
- fix: improve contextual chunk headings by @SimonJasansky in #118
- feat: add option to use single chunk embeddings by @emilradix in #119
- feat: add metadata at the document level by @emilradix in #122
- feat: add support for reasoning tool use and upgrade to Qwen3 by @lsorber in #124
- feat: add front matter to chunk content by @lsorber in #126
- feat: introduce chunklets to improve chunking by @lsorber in #123
- fix: remove mdformat by @emilradix in #128
- feat: rank chunks by the L∞ norm of their multi-vector similarity by @lsorber in #132
- feat: enable weighted reciprocal rank fusion by @emilradix in #136
- fix: fix off-by-one error in parsing of Markdown headings by @joachim-Heirbrant-SL in #133
- feat: improve config and API by @lsorber in #138
- feat: replace SQLite with DuckDB by @lsorber in #137
- ci: skip slow tests in CI by @lsorber in #139
- fix: adapt oversampling to chunk size by @lsorber in #140
- feat: make pandas an optional dependency by @lsorber in #141
- fix: upgrade rerankers and recommended Cohere model by @lsorber in #142
- fix: improve token assignment in late chunking by @lsorber in #144
- fix: run checkpoint after DuckDB inserts by @lsorber in #145
- feat: improve query adapter algorithm by @lsorber in #146
- feat: add ability to control the gap in query adapter by @lsorber in #147
- feat: optimally separate result sets in query adapter by @lsorber in #149
- feat: parallelize inserts and add benchmarking by @lsorber in #150
- docs: set Rerankers verbosity to 0 in README by @ThomasDelsart in #156
- fix: fix parsing of font sizes for pdfs with no headings by @ThomasDelsart in #155
New Contributors
- @SimonJasansky made their first contribution in #118
Full Changelog: v0.7.0...v1.0.0