📘 PDFSensei – AI-Powered Multi-PDF Querying

🚀 PDFSensei is an advanced AI-powered system that enables intelligent querying across multiple PDFs simultaneously using Retrieval-Augmented Generation (RAG). It leverages FAISS indexing, Groq AI models, and Sentence Transformers to deliver accurate and context-aware responses.

✨ Features

✅ Multi-PDF Querying 📂 – Ask questions across multiple documents at once.
✅ AI-Powered Responses 🤖 – Uses Groq's LLaMA 3.1-8B and Gemma 2-9B for intelligent answers.
✅ FAISS-Based Retrieval 🔍 – Efficient vector search for relevant content.
✅ Contextual Understanding 🧠 – Uses sentence transformers for high-quality embeddings.
✅ Interactive UI 🎨 – Built with Streamlit for an intuitive user experience.
✅ Preloaded PDFs 📚 – Includes legal documents for instant queries.
✅ Robust API Handling 🛡️ – Implements fallback models and retry logic to prevent failures.

🏗️ Project Structure

📂 PDFSensei/
│── 📁 assets/           # Default PDFs used by the website  
│   ├── Child rights in the Constitution of India.pdf  
│   ├── Constitution of India.pdf  
│── 📁 config/           # Application configuration  
│   ├── __init__.py      # Initializes Groq and Sentence Transformer  
│── 📁 public/           # Static assets (images)  
│   ├── bot.jpg          # Chatbot avatar  
│   ├── user.jpg         # User avatar  
│── 📁 templates/        # UI Templates  
│   ├── __init__.py  
│   ├── botTemplate.py   # Chatbot response template  
│   ├── cssTemplate.py   # CSS styling for UI  
│   ├── userTemplate.py  # User input template  
│── .env.example         # Example environment file  
│── app.py               # Main Streamlit application

🛠️ Tech Stack

Python 🐍
Streamlit 🎨 (Frontend)
FAISS 🔍 (Vector Search)
Groq AI 🤖 (LLMs: LLaMA 3.1-8B, Gemma 2-9B)
LangChain 🧠 (Text Processing)
Sentence Transformers 🤗 (Embeddings)
PyPDF2 📄 (PDF Parsing)

⚙️ AI Model Integration (Groq)

The system utilizes Groq AI models for generating responses.
The primary model is llama-3.1-8b-instant, with gemma2-9b-it as a fallback model to handle errors and rate limits.

🔹 Fallback Mechanism:

If the primary model fails due to rate limits, the fallback model is used.
Implements retry logic (3 attempts) before switching models.

⚡ Installation

1️⃣ Clone the repository:

git clone https://github.com/ArchitJ6/PDFSensei.git
cd PDFSensei

2️⃣ Install dependencies:

pip install -r requirements.txt

3️⃣ Set up environment variables:
Rename .env.example to .env and add your Groq API credentials.

4️⃣ Run the application:

streamlit run app.py

🚀 How It Works

1️⃣ Upload Multiple PDFs 📂 – Drag and drop or select PDFs.
2️⃣ Extract & Chunk Text 📝 – Uses sentence transformers for embedding.
3️⃣ FAISS Indexing 🔍 – Converts text chunks into vectors for fast retrieval.
4️⃣ AI Response Generation 🤖 – Groq LLMs answer questions based on retrieved content.
5️⃣ View Sources 📖 – Get citations from the document.

📜 License

This project is licensed under the MIT License. 📜

🤝 Contributing

Pull requests are welcome! If you find any issues or have suggestions, feel free to open an issue on GitHub. 🎉

🙌 Acknowledgments

💡 Groq AI 🤖 – For providing high-performance language models.
🔍 FAISS – For enabling efficient vector search.
🤗 Hugging Face – For sentence transformers and NLP tools.
🎨 Streamlit – For an easy-to-use UI framework.
📜 PyPDF2 – For PDF text extraction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 PDFSensei – AI-Powered Multi-PDF Querying

✨ Features

🏗️ Project Structure

🛠️ Tech Stack

⚙️ AI Model Integration (Groq)

⚡ Installation

🚀 How It Works

📜 License

🤝 Contributing

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
config		config
public		public
templates		templates
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

ArchitJ6/PDFSensei

Folders and files

Latest commit

History

Repository files navigation

📘 PDFSensei – AI-Powered Multi-PDF Querying

✨ Features

🏗️ Project Structure

🛠️ Tech Stack

⚙️ AI Model Integration (Groq)

⚡ Installation

🚀 How It Works

📜 License

🤝 Contributing

🙌 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages