You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+63-14Lines changed: 63 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,8 @@ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with Postgr
23
23
- 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
24
24
- ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
25
25
- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html)+[pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html)+[sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
26
+
- 💰 Improved cost and latency with a [prompt caching-aware message array structure](https://platform.openai.com/docs/guides/prompt-caching)
27
+
- 🍰 Improved output quality with [Anthropic's long-context prompt format](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips)
26
28
- 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
### 3. Searching and Retrieval-Augmented Generation (RAG)
159
161
160
-
Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (`hybrid_search` is the default) together with reranking to answer questions with RAG:
162
+
#### 3.1 Simple RAG pipeline
163
+
164
+
Now you can run a simple but powerful RAG pipeline that consists of retrieving the most relevant chunk spans (each of which is a list of consecutive chunks) with hybrid search and reranking, converting the user prompt to a RAG instruction and appending it to the message history, and finally generating the RAG response:
165
+
166
+
```python
167
+
from raglite import create_rag_instruction, rag, retrieve_rag_context
168
+
169
+
# Retrieve relevant chunk spans with hybrid search and reranking:
documents = [chunk_span.document for chunk_span in chunk_spans]
184
+
```
185
+
186
+
#### 3.2 Advanced RAG pipeline
187
+
188
+
> [!TIP]
189
+
> 🥇 Reranking can significantly improve the output quality of a RAG application. To add reranking to your application: first search for a larger set of 20 relevant chunks, then rerank them with a [rerankers](https://github.com/AnswerDotAI/rerankers) reranker, and finally keep the top 5 chunks.
190
+
191
+
In addition to the simple RAG pipeline, RAGLite also offers more advanced control over the individual steps of the pipeline. A full pipeline consists of several steps:
192
+
193
+
1. Searching for relevant chunks with keyword, vector, or hybrid search
194
+
2. Retrieving the chunks from the database
195
+
3. Reranking the chunks and selecting the top 5 results
196
+
4. Extending the chunks with their neighbors and grouping them into chunk spans
197
+
5. Converting the user prompt to a RAG instruction and appending it to the message history
198
+
6. Streaming an LLM response to the message history
199
+
7. Accessing the cited documents from the chunk spans
161
200
162
201
```python
163
202
# Search for chunks:
164
203
from raglite import hybrid_search, keyword_search, vector_search
0 commit comments