Skip to content

PDFSensei πŸ“œπŸ€– - An AI-powered chatbot that extracts relevant insights from PDFs using Groq LLM and Sentence Transformers.

License

Notifications You must be signed in to change notification settings

ArchitJ6/PDFSensei

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“˜ PDFSensei – AI-Powered Multi-PDF Querying

πŸš€ PDFSensei is an advanced AI-powered system that enables intelligent querying across multiple PDFs simultaneously using Retrieval-Augmented Generation (RAG). It leverages FAISS indexing, Groq AI models, and Sentence Transformers to deliver accurate and context-aware responses.

✨ Features

βœ… Multi-PDF Querying πŸ“‚ – Ask questions across multiple documents at once.
βœ… AI-Powered Responses πŸ€– – Uses Groq's LLaMA 3.1-8B and Gemma 2-9B for intelligent answers.
βœ… FAISS-Based Retrieval πŸ” – Efficient vector search for relevant content.
βœ… Contextual Understanding 🧠 – Uses sentence transformers for high-quality embeddings.
βœ… Interactive UI 🎨 – Built with Streamlit for an intuitive user experience.
βœ… Preloaded PDFs πŸ“š – Includes legal documents for instant queries.
βœ… Robust API Handling πŸ›‘οΈ – Implements fallback models and retry logic to prevent failures.

πŸ—οΈ Project Structure

πŸ“‚ PDFSensei/
│── πŸ“ assets/           # Default PDFs used by the website  
β”‚   β”œβ”€β”€ Child rights in the Constitution of India.pdf  
β”‚   β”œβ”€β”€ Constitution of India.pdf  
│── πŸ“ config/           # Application configuration  
β”‚   β”œβ”€β”€ __init__.py      # Initializes Groq and Sentence Transformer  
│── πŸ“ public/           # Static assets (images)  
β”‚   β”œβ”€β”€ bot.jpg          # Chatbot avatar  
β”‚   β”œβ”€β”€ user.jpg         # User avatar  
│── πŸ“ templates/        # UI Templates  
β”‚   β”œβ”€β”€ __init__.py  
β”‚   β”œβ”€β”€ botTemplate.py   # Chatbot response template  
β”‚   β”œβ”€β”€ cssTemplate.py   # CSS styling for UI  
β”‚   β”œβ”€β”€ userTemplate.py  # User input template  
│── .env.example         # Example environment file  
│── app.py               # Main Streamlit application  

πŸ› οΈ Tech Stack

  • Python 🐍
  • Streamlit 🎨 (Frontend)
  • FAISS πŸ” (Vector Search)
  • Groq AI πŸ€– (LLMs: LLaMA 3.1-8B, Gemma 2-9B)
  • LangChain 🧠 (Text Processing)
  • Sentence Transformers πŸ€— (Embeddings)
  • PyPDF2 πŸ“„ (PDF Parsing)

βš™οΈ AI Model Integration (Groq)

The system utilizes Groq AI models for generating responses.
The primary model is llama-3.1-8b-instant, with gemma2-9b-it as a fallback model to handle errors and rate limits.

πŸ”Ή Fallback Mechanism:

  • If the primary model fails due to rate limits, the fallback model is used.
  • Implements retry logic (3 attempts) before switching models.

⚑ Installation

1️⃣ Clone the repository:

git clone https://github.com/ArchitJ6/PDFSensei.git
cd PDFSensei

2️⃣ Install dependencies:

pip install -r requirements.txt

3️⃣ Set up environment variables:
Rename .env.example to .env and add your Groq API credentials.

4️⃣ Run the application:

streamlit run app.py

πŸš€ How It Works

1️⃣ Upload Multiple PDFs πŸ“‚ – Drag and drop or select PDFs.
2️⃣ Extract & Chunk Text πŸ“ – Uses sentence transformers for embedding.
3️⃣ FAISS Indexing πŸ” – Converts text chunks into vectors for fast retrieval.
4️⃣ AI Response Generation πŸ€– – Groq LLMs answer questions based on retrieved content.
5️⃣ View Sources πŸ“– – Get citations from the document.

πŸ“œ License

This project is licensed under the MIT License. πŸ“œ

🀝 Contributing

Pull requests are welcome! If you find any issues or have suggestions, feel free to open an issue on GitHub. πŸŽ‰

πŸ™Œ Acknowledgments

πŸ’‘ Groq AI πŸ€– – For providing high-performance language models.
πŸ” FAISS – For enabling efficient vector search.
πŸ€— Hugging Face – For sentence transformers and NLP tools.
🎨 Streamlit – For an easy-to-use UI framework.
πŸ“œ PyPDF2 – For PDF text extraction.

About

PDFSensei πŸ“œπŸ€– - An AI-powered chatbot that extracts relevant insights from PDFs using Groq LLM and Sentence Transformers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages