You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+50-16Lines changed: 50 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -23,12 +23,14 @@ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with Postgr
23
23
- 🧬 Multi-vector chunk embedding with [late chunking](https://weaviate.io/blog/late-chunking) and [contextual chunk headings](https://d-star.ai/solving-the-out-of-context-chunk-problem-for-rag)
24
24
- ✂️ Optimal [level 4 semantic chunking](https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d) by solving a [binary integer programming problem](https://en.wikipedia.org/wiki/Integer_programming)
25
25
- 🔍 [Hybrid search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) with the database's native keyword & vector search ([tsvector](https://www.postgresql.org/docs/current/datatype-textsearch.html)+[pgvector](https://github.com/pgvector/pgvector), [FTS5](https://www.sqlite.org/fts5.html)+[sqlite-vec](https://github.com/asg017/sqlite-vec)[^1])
26
+
- 💭 [Adaptive retrieval](https://arxiv.org/abs/2403.14403) where the LLM decides whether to and what to retrieve based on the query
26
27
- 💰 Improved cost and latency with a [prompt caching-aware message array structure](https://platform.openai.com/docs/guides/prompt-caching)
27
28
- 🍰 Improved output quality with [Anthropic's long-context prompt format](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips)
28
29
- 🌀 Optimal [closed-form linear query adapter](src/raglite/_query_adapter.py) by solving an [orthogonal Procrustes problem](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
29
30
30
31
##### Extensible
31
32
33
+
- 🔌 A built-in [Model Context Protocol](https://modelcontextprotocol.io) (MCP) server that any MCP client like [Claude desktop](https://claude.ai/download) can connect with
32
34
- 💬 Optional customizable ChatGPT-like frontend for [web](https://docs.chainlit.io/deploy/copilot), [Slack](https://docs.chainlit.io/deploy/slack), and [Teams](https://docs.chainlit.io/deploy/teams) with [Chainlit](https://github.com/Chainlit/chainlit)
33
35
- ✍️ Optional conversion of any input document to Markdown with [Pandoc](https://github.com/jgm/pandoc)
34
36
- ✅ Optional evaluation of retrieval and generation performance with [Ragas](https://github.com/explodinggradients/ragas)
@@ -87,10 +89,11 @@ pip install raglite[ragas]
87
89
88
90
1.[Configuring RAGLite](#1-configuring-raglite)
89
91
2.[Inserting documents](#2-inserting-documents)
90
-
3.[Searching and Retrieval-Augmented Generation (RAG)](#3-searching-and-retrieval-augmented-generation-rag)
### 3. Searching and Retrieval-Augmented Generation (RAG)
163
+
### 3. Retrieval-Augmented Generation (RAG)
161
164
162
-
#### 3.1 Dynamically routed RAG
165
+
#### 3.1 Adaptive RAG
163
166
164
-
Now you can run a dynamically routed RAG pipeline that consists of adding the user prompt to the message history and streaming the LLM response. Depending on the user prompt, the LLM may choose to retrieve context using RAGLite by invoking a retrieval tool. If retrieval is necessary, the LLM determines the search query and RAGLite applies hybrid search with reranking to retrieve the most relevant chunk spans (each of which is a list of consecutive chunks). The retrieval results are sent to the `on_retrieval` callback and are also appended to the message history as a tool output. Finally, the LLM response given the RAG context is streamed and the message history is updated with the assistant response:
167
+
Now you can run an adaptive RAG pipeline that consists of adding the user prompt to the message history and streaming the LLM response:
165
168
166
169
```python
167
170
from raglite import rag
@@ -173,9 +176,7 @@ messages.append({
173
176
"content": "How is intelligence measured?"
174
177
})
175
178
176
-
# Let the LLM decide whether to search the database by providing a retrieval tool to the LLM.
177
-
# If requested, RAGLite then uses hybrid search and reranking to append RAG context to the message history.
178
-
# Finally, assistant response is streamed and appended to the message history.
179
+
# Adaptively decide whether to retrieve and stream the response:
documents = [chunk_span.document for chunk_span in chunk_spans]
186
187
```
187
188
189
+
The LLM will adaptively decide whether to retrieve information based on the complexity of the user prompt. If retrieval is necessary, the LLM generates the search query and RAGLite applies hybrid search and reranking to retrieve the most relevant chunk spans (each of which is a list of consecutive chunks). The retrieval results are sent to the `on_retrieval` callback and are appended to the message history as a tool output. Finally, the assistant response is streamed and appended to the message history.
190
+
188
191
#### 3.2 Programmable RAG
189
192
190
193
If you need manual control over the RAG pipeline, you can run a basic but powerful pipeline that consists of retrieving the most relevant chunk spans with hybrid search and reranking, converting the user prompt to a RAG instruction and appending it to the message history, and finally generating the RAG response:
@@ -222,6 +225,8 @@ RAGLite also offers more advanced control over the individual steps of a full RA
222
225
6. Streaming an LLM response to the message history
223
226
7. Accessing the cited documents from the chunk spans
224
227
228
+
A full RAG pipeline is straightforward to implement with RAGLite:
229
+
225
230
```python
226
231
# Search for chunks:
227
232
from raglite import hybrid_search, keyword_search, vector_search
### 6. Serving a customizable ChatGPT-like frontend
297
+
### 6. Running a Model Context Protocol (MCP) server
298
+
299
+
RAGLite comes with an [MCP server](https://modelcontextprotocol.io) implemented with [FastMCP](https://github.com/jlowin/fastmcp) that exposes a `search_knowledge_base`[tool](https://github.com/jlowin/fastmcp?tab=readme-ov-file#tools). To use the server:
Now, when you start Claude desktop you should see a 🔨 icon at the bottom right of your prompt indicating that the Claude has successfully connected with the MCP server.
320
+
321
+
When relevant, Claude will suggest to use the `search_knowledge_base` tool that the MCP server provides. You can also explicitly ask Claude to search the knowledge base if you want to be certain that it does.
0 commit comments