A turn-key solution for creating a customized AI Personal Assistant with an Ollama-compatible RAG API. This project combines large language models with your personal information to deliver contextually relevant, personalized responses.
- Ollama Personal Assistant with RAG
This project provides a complete solution for creating a personalized AI assistant powered by Ollama models and enhanced with Retrieval Augmented Generation (RAG). It enables your AI assistant to know and recall information about you, providing responses that feel natural and personalized.
The system combines:
- Personality & Formatting: Customize an existing LLM to add a specific personality and response style.
- Static Personal Data: Incorporate persistent information about your life: family, health, work, etc.
- Dynamic Data: Include frequently changing information like weather forecasts, calendar events, and messages.
Note
This project uses an example assistant named OLIVER (Overly Logical Interface with Vaguely Eccentric Replies) as a template. Feel free to customize it according to your preferences.
- Custom LLM Personality: Define exactly how your assistant responds and presents information
- Structured Data Management: Separate static and dynamic personal information
- API Compatibility: Use with any Ollama-compatible client, including OpenWebUI
- Interactive CLI: Test and use the assistant directly from your terminal
- Docker Support: Deploy and run in containers for better isolation and portability
- Modular Design: Easily extend or modify components to fit your needs
Follow these steps to set up your personal assistant:
- Ollama installed locally or on a remote server
- Python 3.10+ installed
- Git for cloning the repository
- Clone the repository and set up the environment:
# Clone the repository
git clone https://github.com/robertsinfosec/ollama-personal-assistant-rag.git
# Change to the project directory
cd ollama-personal-assistant-rag/src
# Create a Python virtual environment and activate it
python -m venv .venv
source .venv/bin/activate # On Windows use: .\.venv\Scripts\Activate.ps1
# Install the required dependencies
pip install -r src/requirements.txt
- Customize your personal data by editing the YAML files:
# Edit static information
cd src/data/static/
# Edit files like owner.yaml, family.yaml, etc.
# Edit dynamic information
cd ../dynamic/
# Edit files like calendar.yaml, weather.yaml, etc.
# TODO: Create scripts that dynamically update these files
- Generate the knowledge base document:
cd ../../ # Return to src directory
python main.py generate
Note
This process takes all of the YAML files and Jinja2 templates in the data/
directory and generates one, large markdown document called personal_info.md
. This document serves as the knowledge base for your assistant.
- Create your custom Ollama model:
On the Ollama server (if this isn't your local workstation), create a custom model using the Modelfile provided in the models/
directory, then run:
python main.py create-model --name oliver-assistant --modelfile models/Modelfile
- Start the API server:
python main.py api
This will start the FastAPI server, which listens for incoming requests on port 8901 by default. This server will look and act like an Ollama server, in that it will reflect all of the models that are available on the Ollama server, but it will also have a /query
endpoint that will allow you to query the RAG system directly.
For detailed setup instructions, see the src/README.md.
The project is organized into several key modules:
- api: FastAPI implementation of the RAG-enhanced API server
- config: Configuration settings for RAG and template generation
- data: YAML data files and Jinja2 templates for personal information
- generation: Tools for generating the markdown knowledge base
- models: Ollama Modelfiles for customizing the assistant's personality
- rag: Core RAG functionality for context retrieval and response enhancement
Each module has its own README with detailed documentation.
Use the interactive CLI for direct conversations and testing:
cd src
python main.py interactive
For more context information, use verbose mode:
python main.py interactive --verbose
Available commands in the interactive CLI:
/help
- Show available commands/context
- Toggle display of retrieved context/clear_history
- Clear conversation history/params
- Show current parameters/model MODEL
- Change the model/reload
- Reload the vector store
The API server runs on port 8901 by default:
cd src
python main.py api
Access the API endpoints:
POST /api/chat
- Ollama-compatible chat APIPOST /query
- Direct RAG query endpointGET /models
- List available modelsPOST /reload
- Reload the vector store
For complete API documentation, see src/api/README.md.
Connect OpenWebUI to your RAG-enhanced assistant:
- In OpenWebUI under
/admin/settings
, go to the Connections tab - Click the
+
next to "Manage Ollama API Connections" - Add a new connection with these settings:
- URL:
http://[YOUR_API_HOST]:8901
- Prefix ID:
RAG
(or whatever you prefer) - "Add Model Id" field:
oliver-assistant
- URL:
- Click
Save
Important
The RAG
prefix differentiates models that come from your RAG endpoint versus local models on the Ollama server:
Run the personal assistant in a Docker container:
cd src
docker build -t ollama-assistant:latest .
docker run -d --name oliver-assistant -p 8901:8901 ollama-assistant:latest
For detailed Docker instructions, see src/DOCKER.md.
The system uses Jinja2 templates to transform YAML data into the final markdown document. You can customize this process by:
- Editing the templates in
src/data/templates/
- Modifying the section mappings in
src/config/template_config.py
- Re-running
python main.py generate
For template details, see src/generation/README.md.
To add new dynamic data sources:
- Create a new YAML file in
src/data/dynamic/
- Create a corresponding Jinja2 template in
src/data/templates/
- Add the new mapping to
src/config/template_config.py
- Update your data collection scripts to maintain the YAML file
Consider setting up automated updates for dynamic data using cron jobs or scheduled tasks.
Adjust RAG parameters in src/config/rag_config.py
:
DEFAULT_CHUNK_SIZE
andDEFAULT_CHUNK_OVERLAP
control document chunkingDEFAULT_TOP_K
sets how many context chunks to retrieveDEFAULT_TEMPERATURE
affects response creativity vs. coherence
Retrieval Augmented Generation (RAG) combines a retrieval mechanism with a generative language model:
- Your personal information is transformed into vector embeddings and stored in a vector database
- When you ask a question, the system:
- Converts your question to a vector
- Finds the most relevant chunks of your personal information
- Provides those chunks as context to the LLM
- Generates a response using this personalized context
This approach allows the assistant to reference your personal information without it being part of the model's training data.
The RAG workflow for a typical request:
- Query Processing: Your question is vectorized using the Ollama model's embedding capabilities
- Retrieval: The FAISS vector store finds relevant chunks from your personal information
- Context Integration: Retrieved information is formatted into a prompt with conversation history
- Response Generation: The Ollama model generates a personalized response using the enhanced context
- Conversation Management: The exchange is added to conversation history for future context
There are two ways to define your assistant's personality:
-
Custom Ollama Model (current approach): Personality is defined at the model level via the Modelfile
- Pros: Consistent personality across interfaces
- Cons: Requires model rebuilds for personality changes
-
Generic LLM + RAG System Prompt: Personality is injected via the RAG prompt
- Pros: More flexibility for personality changes
- Pros: Model-agnostic (works with any LLM)
The current implementation uses option 1, but the code could be modified to support both approaches.