A Retrieval-Augmented Generation (RAG) system that enhances LLM responses with relevant document context.
- Efficient document chunking with context preservation
- Markdown-aware processing that maintains document structure
- Semantic search using multilingual embeddings
- Incremental document updates using content hashing
- Integration with Google's Gemini LLM
- Install dependencies:
uv sync
- Set up environment variables: You can use any model, along with the correct key. For a list of all models and providers, refer to litellm docs.
GEMINI_API_KEY=your_api_key_here
- Place your documentation in the
docs/
directory as markdown files.
uv run agent.py "Describe the caves of Xylos." # optional -web to add websearch tool to agent.
The embedding creation and database management is done through the CLI:
# Add/update documents and run the test query
uv run src/embed.py
# List all stored passages
uv run src/embed.py list
# Clear the database
uv run src/embed.py clear
# Perform a custom search query for testing
uv run src/embed.py "your search query"
text_chunker.py
: Handles intelligent document splittingembed.py
: Manages document processing and vector databaseretriever.py
: Implements semantic search functionalityagent.py
: Integrates components with LLM using smolagents
- Uses Alibaba's multilingual embedding model for semantic search
- ChromaDB for vector storage
- Google Gemini for LLM responses
- Smart chunking preserves document context and header hierarchy
- Incremental updates avoid re-embedding unchanged content
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request