A complete offline, local multi modal RAG system.
- clone the repo.
git clone https://github.com/shekharkoirala/PPTRag.git
cd PPTRag
- Download the embeddings and put it inside the backend folder.
https://drive.google.com/file/d/1eA1tGJQQJjKJmYToEapYhA918lNnSCb1/view?usp=sharing
📂 backend
├── 📂 .byaldi
│ ├── 📂 reports
│ │ ├── doc_ids_to_file_names.json.gz
│ │ ├── embed_id_to_doc_id.json.gz
│ │ ├── index_config.json.gz
│ │ ├── metadata.json.gz
├── 📂 app
│ ├── __pycache__
│ ├── generator.py
│ ├── main.py
Make sure the folder structure matches the structure when you unzip it
- Run the docker compose
docker compose up --build
-
wait till server properly loads up.
You will see the logs of smollVlm model being loaded.
-
Browse the rag.
http://localhost:80/
You might have to wait 10-15 minutes for the initial first messages. but usually it give answers in 1 minutes.
or
- Install uv based on its recommended version: https://docs.astral.sh/uv/getting-started/installation/
- Install node and pnpm
pnpm install # install dependencies
pnpm run dev # running frontend UI
uv sync # install dependencies
uv run fastapi dev # running backend service
- Either use the ingestion pipeline
python ingestion/ingest.py --path ./data/pdf
*reports collections will be made for byaldi using the colpali model. The process will take around 10/15 mins.
- Or Download the zip and put it in the backend folder as .byaldi folder
https://drive.google.com/file/d/1eA1tGJQQJjKJmYToEapYhA918lNnSCb1/view?usp=sharing
- Development Timeline
- My first recommendation is to use Milvus as vector storage ( best for production settings as well. ) Here Byaldi is used due to two main reason a. It has clean rag pipeline. b. It works with both CPU/GPU ( this is my main reason.)
- Rag system contains Retriever and the generator a. Retriever: byaldi is used as data are preprocessed and loaded in CPU as vectors. b. Generator: As the system is multi-modal, smolVlm model is used for the generation task.