An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.
- Upload Ayurveda PDFs (text-based only)
- Ask natural-language questions based on uploaded content
- Chunk text smartly using LangChain
- Semantic search with MiniLM embeddings
- Fast retrieval using FAISS
- Powered by LLaMA 3 via Groq API
- Based on the RAG architecture (Retrieval-Augmented Generation)
- Easy-to-use interface via Streamlit
Component | Technology |
---|---|
Frontend | Streamlit |
Backend | FastAPI |
Embeddings | HuggingFace MiniLM-L6-v2 |
Vector Search | FAISS |
Language Model | LLaMA 3 via Groq API |
PDF Processing | PyMuPDF + LangChain |
Prompting | LangChain + Custom PromptTemplate |
Environment Var | python-dotenv |
- Upload: User uploads a text-based PDF
- Text Extraction: PDF is read using PyMuPDF
- Chunking: Text is broken into smaller pieces using RecursiveCharacterTextSplitter
- Embedding: Chunks are embedded using HuggingFace MiniLM
- Vector DB: Chunks are stored in a FAISS vector store
- Q&A:Question → Similar chunks retrieved Chunks + Question → sent to LLaMA 3 LLM generates a final context-based answer This is a Retrieval-Augmented Generation (RAG) system.
git clone https://github.com/yourusername/ayurveda-chatbot.git
cd ayurveda-chatbot
python -m venv ayurveda_env
ayurveda_env\Scripts\activate
pip install -r requirements.txt
Create a .env
file with your Groq key:
GROQ_API_KEY=your_groq_api_key
python app.py
streamlit run streamlit_app.py