This project provides a REST API built with FastAPI that enables question answering over PDF documents using LangChain. It allows users to ask questions, which are answered by a Large Language Model (LLM) based on the information contained in a predefined PDF document.
- ✅ Load and split content from a PDF file
- ✅ Lexical retrieval using BM25 and TF-IDF
- ✅ Re-ranking with Flashrank
- ✅ Use of EnsembleRetriever + ContextualCompressionRetriever
- ✅ RAG pipeline powered by LangChain
- ✅ HTTP endpoint built with FastAPI
- ✅ Configuration via
.env
file
- Python 3.10+
- LangChain
- FastAPI
- OpenAI / LM Studio
- BM25 & TF-IDF retriever
- PyPDFLoader
git clone https://github.com/kullaniciadi/rag-pdf-qa.git
cd rag-pdf-qa```
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
To run this project, you will need to add the following environment variables to your .env file
PDF_PATH
LLM_API_KEY
LLM_API_BASE
RERANKER_MODEL
EMBEDDING_MODEL
uvicorn main:app --reload