- Project Overview
- Sequence Diagram
- Prerequisites
- Installing Podman and Podman-Compose
- Configuration
- Run the Assistant
- Features
- Enhanced Retrieval Mechanisms
- Licenses
- Contributing
- Contact
The Ollama Chatbot is an AI-based chatbot that utilizes the llama3.2:3B
model or alternative models. It can incorporate Retrieval-Augmented Generation to deliver enhanced automated responses when desired. The project is managed through Podman containers, allowing for efficient and scalable deployment.
Package | Version |
---|---|
podman | 5.1.2 or greater |
podman-compose | 1.2.0 or greater |
1. On Linux:
-
For Debian/Ubuntu-based distributions:
sudo apt update sudo apt install podman -y
-
For Fedora:
sudo dnf install podman -y
-
For Arch Linux:
sudo pacman -S podman
2. On macOS:
-
Install Podman using Homebrew:
brew install podman
-
Initialize the Podman environment (if not already done):
podman machine init podman machine start
3. On Windows:
- Download the installer from the official Podman website.
- Follow the installation instructions provided for Windows.
Ensure you have Python installed on your system.
-
Install Podman-Compose with pip:
pip install podman-compose
-
Verify the installation:
podman-compose --version
Model: Specify the model you want to use by modifying the model variable in src/resources/config.yaml
. llama3.2:3B
is set as the default model. To see the list of available models, visit https://ollama.com/library.
To interact with the chatbot, follow these steps:
-
Navigate to the
scripts/
directorycd scripts/
-
Start the ollama container using the script
ollama.bat
on Windows, or the scriptollama.sh
on macOS/Linuxollama.bat start
ollama.sh start
NOTE: To use a model other than llama3.2, specify the model name as the second parameter. For example, to use the
granite-code
model, runollama.bat start granite-code
on Windows orollama.sh start granite-code
on macOS/Linux. -
Start the app container once the Ollama container is up and running. Use the script
app.bat
on Windows, or the scriptapp.sh
on macOS/Linuxapp.bat start
app.sh start
-
Once the app container is running and the Uvicorn server is started, open your browser and go to http://localhost:8000/ to chat with the assistant.
-
Customizable System Prompt
- Define a System Prompt to customize the assistant's behavior and response style to fit your needs.
-
Document Collections & Retrieval-Augmented Generation (RAG)
- RAG enhances the assistant’s responses by combining its knowledge with user-uploaded documents.
- You can upload documents directly through the "Upload Documents" option on the left panel, supporting various formats including
DOCX
,PPTX
,PDF
,TXT
,XLSX
,CSV
,HTML
,Markdown
,RTF
, andODT
. - Use the "Clean Document Collections" button to delete previously uploaded files, keeping your document set relevant and current. Please note that this action will remove ALL previously uploaded documents.
- With RAG, you can also choose the number of document chunks to retrieve for each prompt, ranging from 1 to 10. This allows the assistant to provide more contextually relevant answers by leveraging the content from your uploaded documents.
In addition to the standard Retrieval-Augmented Generation (RAG) mechanism, the Ollama Chatbot incorporates advanced techniques for improving document retrieval and response relevance:
While traditional RAG may rely on isolated chunks of text, this project enhances the vectorization process by considering the context of each chunk. The text chunks are vectorized in a manner that takes into account neighboring chunks, creating a more contextually aware representation.
How It Works:
- Chunks are vectorized by including neighboring chunks within a specified window (default is 1). This ensures that the context around each chunk is captured, improving the quality of responses.
- For each chunk, we apply dynamic weighting based on the similarity to adjacent chunks. This weighting adjusts the influence of the neighboring chunks, allowing the model to emphasize the most relevant context.
To further improve document retrieval accuracy, a hybrid search approach combines vector-based search using FAISS with lexical search through SQL-based keyword matching. This combination ensures a more comprehensive search, capturing both semantic and keyword-based relevance.
How It Works:
- FAISS Search: The query is transformed into a vector, which is then used to find the most similar documents in the database through FAISS, based on vector distance.
- SQLite Search: Simultaneously, keywords are extracted from the query using tokenization, stopword removal, and part-of-speech tagging, ensuring that only significant words (like nouns or proper nouns) are selected for the lexical search. These extracted keywords are then used to find matching documents in the SQLite database.
- Combination of Results: The results from both searches are merged, with FAISS results receiving a higher weight due to their semantic relevance, while lexical results contribute at a lower weight, offering a balanced retrieval that enhances the accuracy of document selection.
This hybrid search approach significantly improves the ability to retrieve relevant documents, particularly in cases where certain keywords may not be well-represented in the vector space.
To refine the quality of the retrieved documents, a reranking mechanism is applied. This process ensures that the most relevant documents are selected and presented based on their similarity to the query.
How It Works:
- The retrieved documents are first ranked based on their similarity to the prompt using cosine similarity.
- Then, a reranking process is applied to reorder the documents, ensuring that the most relevant ones are selected.
- The number of documents returned is controlled, allowing fine-tuned responses by limiting the results to the top-n most relevant documents.
These additional mechanisms enhance the quality of document retrieval and ensure that the chatbot provides responses that are more accurate and contextually relevant.
The Ollama Chatbot project uses several third-party Python libraries, each distributed under its respective license. The complete list of dependencies and their licenses is provided in LICENSES.md.