VisText-RAG-DocumentQNA

This repository provides a complete implementation of a multimodal RAG system designed for document question answering. It includes an indexing pipeline that processes your document corpora by extracting both text and visual elements (such as tables and figures) and storing them in a vector database using sentence embeddings and ColPALI image embeddings. It also features a chat inference pipeline that handles user queries, performs dual retrieval over text and image embeddings, and generates context-aware answers using a vision-capable language model. This setup enables accurate and explainable retrieval from visually rich documents.

📝 You can read the full article here:
👉 ColPALI Meets DocLayNet: A Vision-Aware Multimodal RAG for Document QA - Medium

🖼️ Visual Examples

🔍 Evaluation: Comparison with Common RAG Pipeline

Our pipeline demonstrates superior retrieval accuracy and multimodal understanding compared to common RAG pipelines, especially in handling visually complex document content.

🎨 Chainlit App – Frontend Overview

⚙️ Reproducing the Environment

# Create Conda Environment
conda create -n multimodal_rag python=3.11
conda activate multimodal_rag

# Install Libraries
pip install -r requirements.txt

🚀 Running the Application

Once you've set up the environment and downloaded the required models, you can launch both the backend and frontend with the following commands:

✅ Run the Backend Server (FastAPI)

uvicorn main:app --port 8000

✅ Run the Frontend (Streamlit)

streamlit run frontend.py --port 8001

📚 Adding Knowledge Base

You can enhance the chatbot's responses by providing your own knowledge base (PDF documents). Before indexing any documents, ensure that the backend server is running.

Put your PDF files into the following directory

document_sources/

Then, run the following command to index new documents:

python execute_indexing.py

If you want to refresh the entire indexing pipeline (i.e., delete old vectors and start fresh from document_sources/), run:

python execute_indexing.py --initialize

📚 Citation

If you use this work, please consider citing the following foundational papers:

@misc{faysse2024colpaliefficientdocumentretrieval,
  title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
  author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
  year={2024},
  eprint={2407.01449},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2407.01449}, 
}
@inproceedings{pfitzmann2022doclaynet,
  title={Doclaynet: A large human-annotated dataset for document-layout segmentation},
  author={Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter},
  booktitle={Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining},
  pages={3743--3751},
  year={2022}
}
@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.chainlit		.chainlit
__pycache__		__pycache__
assets		assets
document_sources		document_sources
image_database/SLB-2023-Annual-Report.pdf		image_database/SLB-2023-Annual-Report.pdf
.gitignore		.gitignore
.milvus_file.db.lock		.milvus_file.db.lock
README.md		README.md
chainlit.md		chainlit.md
execute_indexing.py		execute_indexing.py
frontend.py		frontend.py
function.py		function.py
main.py		main.py
milvus_file.db		milvus_file.db
playground_chat.ipynb		playground_chat.ipynb
playground_indexing.ipynb		playground_indexing.ipynb
prompt_template.py		prompt_template.py
requirements.txt		requirements.txt
retriever_class.py		retriever_class.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisText-RAG-DocumentQNA

🖼️ Visual Examples

🔍 Evaluation: Comparison with Common RAG Pipeline

🎨 Chainlit App – Frontend Overview

⚙️ Reproducing the Environment

🚀 Running the Application

✅ Run the Backend Server (FastAPI)

✅ Run the Frontend (Streamlit)

📚 Adding Knowledge Base

📚 Citation

About

Uh oh!

Releases

Packages

Languages

hanifsyarubany/VisText-RAG-Document-QNA

Folders and files

Latest commit

History

Repository files navigation

VisText-RAG-DocumentQNA

🖼️ Visual Examples

🔍 Evaluation: Comparison with Common RAG Pipeline

🎨 Chainlit App – Frontend Overview

⚙️ Reproducing the Environment

🚀 Running the Application

✅ Run the Backend Server (FastAPI)

✅ Run the Frontend (Streamlit)

📚 Adding Knowledge Base

📚 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages