π PDFSensei is an advanced AI-powered system that enables intelligent querying across multiple PDFs simultaneously using Retrieval-Augmented Generation (RAG). It leverages FAISS indexing, Groq AI models, and Sentence Transformers to deliver accurate and context-aware responses.
β
Multi-PDF Querying π β Ask questions across multiple documents at once.
β
AI-Powered Responses π€ β Uses Groq's LLaMA 3.1-8B and Gemma 2-9B for intelligent answers.
β
FAISS-Based Retrieval π β Efficient vector search for relevant content.
β
Contextual Understanding π§ β Uses sentence transformers for high-quality embeddings.
β
Interactive UI π¨ β Built with Streamlit for an intuitive user experience.
β
Preloaded PDFs π β Includes legal documents for instant queries.
β
Robust API Handling π‘οΈ β Implements fallback models and retry logic to prevent failures.
π PDFSensei/
βββ π assets/ # Default PDFs used by the website
β βββ Child rights in the Constitution of India.pdf
β βββ Constitution of India.pdf
βββ π config/ # Application configuration
β βββ __init__.py # Initializes Groq and Sentence Transformer
βββ π public/ # Static assets (images)
β βββ bot.jpg # Chatbot avatar
β βββ user.jpg # User avatar
βββ π templates/ # UI Templates
β βββ __init__.py
β βββ botTemplate.py # Chatbot response template
β βββ cssTemplate.py # CSS styling for UI
β βββ userTemplate.py # User input template
βββ .env.example # Example environment file
βββ app.py # Main Streamlit application
- Python π
- Streamlit π¨ (Frontend)
- FAISS π (Vector Search)
- Groq AI π€ (LLMs: LLaMA 3.1-8B, Gemma 2-9B)
- LangChain π§ (Text Processing)
- Sentence Transformers π€ (Embeddings)
- PyPDF2 π (PDF Parsing)
The system utilizes Groq AI models for generating responses.
The primary model is llama-3.1-8b-instant
, with gemma2-9b-it
as a fallback model to handle errors and rate limits.
πΉ Fallback Mechanism:
- If the primary model fails due to rate limits, the fallback model is used.
- Implements retry logic (3 attempts) before switching models.
1οΈβ£ Clone the repository:
git clone https://github.com/ArchitJ6/PDFSensei.git
cd PDFSensei
2οΈβ£ Install dependencies:
pip install -r requirements.txt
3οΈβ£ Set up environment variables:
Rename .env.example
to .env
and add your Groq API credentials.
4οΈβ£ Run the application:
streamlit run app.py
1οΈβ£ Upload Multiple PDFs π β Drag and drop or select PDFs.
2οΈβ£ Extract & Chunk Text π β Uses sentence transformers for embedding.
3οΈβ£ FAISS Indexing π β Converts text chunks into vectors for fast retrieval.
4οΈβ£ AI Response Generation π€ β Groq LLMs answer questions based on retrieved content.
5οΈβ£ View Sources π β Get citations from the document.
This project is licensed under the MIT License. π
Pull requests are welcome! If you find any issues or have suggestions, feel free to open an issue on GitHub. π
π‘ Groq AI π€ β For providing high-performance language models.
π FAISS β For enabling efficient vector search.
π€ Hugging Face β For sentence transformers and NLP tools.
π¨ Streamlit β For an easy-to-use UI framework.
π PyPDF2 β For PDF text extraction.