PPTRag

A complete offline, local multi modal RAG system.

Installation

Using Docker compose

clone the repo.

git clone https://github.com/shekharkoirala/PPTRag.git
cd PPTRag

Download the embeddings and put it inside the backend folder.

https://drive.google.com/file/d/1eA1tGJQQJjKJmYToEapYhA918lNnSCb1/view?usp=sharing

📂 backend
├── 📂 .byaldi
│   ├── 📂 reports
│   │   ├── doc_ids_to_file_names.json.gz
│   │   ├── embed_id_to_doc_id.json.gz
│   │   ├── index_config.json.gz
│   │   ├── metadata.json.gz
├── 📂 app
│   ├── __pycache__
│   ├── generator.py
│   ├── main.py

Make sure the folder structure matches the structure when you unzip it

Run the docker compose

docker compose up --build

wait till server properly loads up. You will see the logs of smollVlm model being loaded.
Browse the rag.

http://localhost:80/

You might have to wait 10-15 minutes for the initial first messages. but usually it give answers in 1 minutes.

or

Install UV , pnpm and its dependencies

Install uv based on its recommended version: https://docs.astral.sh/uv/getting-started/installation/
Install node and pnpm

Usage

pnpm install # install dependencies
pnpm run dev # running frontend UI

uv sync # install dependencies
uv run fastapi dev # running backend service

Data Ingestion

Either use the ingestion pipeline

python ingestion/ingest.py --path ./data/pdf

*reports collections will be made for byaldi using the colpali model. The process will take around 10/15 mins.

Or Download the zip and put it in the backend folder as .byaldi folder

https://drive.google.com/file/d/1eA1tGJQQJjKJmYToEapYhA918lNnSCb1/view?usp=sharing

V1

Frontend

Backend

Notes

Development Timeline
My first recommendation is to use Milvus as vector storage ( best for production settings as well. ) Here Byaldi is used due to two main reason a. It has clean rag pipeline. b. It works with both CPU/GPU ( this is my main reason.)
Rag system contains Retriever and the generator a. Retriever: byaldi is used as data are preprocessed and loaded in CPU as vectors. b. Generator: As the system is multi-modal, smolVlm model is used for the generation task.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
changelog.md		changelog.md
deployment.md		deployment.md
docker-compose.yml		docker-compose.yml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PPTRag

Installation

Using Docker compose

Install UV , pnpm and its dependencies

Usage

Data Ingestion

V1

Frontend

Backend

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shekharkoirala/PPTRag

Folders and files

Latest commit

History

Repository files navigation

PPTRag

Installation

Using Docker compose

Install UV , pnpm and its dependencies

Usage

Data Ingestion

V1

Frontend

Backend

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages