This project is a Retrieval-Augmented Generation (RAG) chatbot built with Next.js 15+ (App Router), Groq API, Weaviate vector database, Puppeteer, and Redux for state management. The chatbot scrapes content from user-provided URLs, stores it in a vector database, and generates intelligent responses with real-time citations.
- RAG Architecture: Enhances chatbot responses with relevant retrieved content.
- Source Citations: Displays clickable links to the original sources.
- Real-Time Scraping: Uses Puppeteer for web scraping.
- Weaviate for Vector Storage: Efficient storage and retrieval of knowledge.
- Groq API for AI Responses: Generates accurate, context-aware responses.
- Redux for State Management: Handles chat history efficiently.
- Next.js App Router: Optimized for server-side and client-side rendering.
- Frontend: Next.js 15+, React, Tailwind CSS, Redux
- Backend: Node.js, Express, Puppeteer, Weaviate, Groq API
- Database: Weaviate (vector storage)
- Deployment: Vecerl
git clone https://github.com/TheUzair/RAG-Chatbot.git
cd rag-chatbot
npm install
Create a .env.local
file and add:
GROQ_API_KEY=your-groq-api-key
WEAVIATE_URL=your-weaviate-instance-url
WEAVIATE_API_KEY=your-weaviate-api-key
npm run dev
Access the chatbot at http://localhost:3000
.
- User enters a URL.
- Puppeteer extracts text content.
- Data is stored in Weaviate as vector embeddings.
- User asks a question.
- Weaviate retrieves the most relevant content.
- Groq API generates a response using both retrieved and general knowledge.
- The chatbot displays the response.
- If a source is available, a clickable citation is shown.
User: "What is quantum computing?" Bot: "Quantum computing uses quantum bits (qubits) to perform computations that classical computers struggle with. Click here to read more."
User: "Summarize the main points of this article." Bot: "The article discusses the latest AI trends, focusing on ethical concerns and advancements. Click here for details."
- Empty Queries: Prevents sending blank messages.
- Invalid URLs: Displays an error if an incorrect URL is entered.
- Slow Responses: Shows a loading indicator while fetching results.
- Rate Limits: Implements retry mechanisms.
- JavaScript-Rendered Pages: If content is hidden behind client-side rendering, Puppeteer may not capture it.
- Highly Dynamic Content: Pages that change frequently may lead to outdated embeddings.
- Multimodal Inputs: Currently, only text-based queries are supported.
- ✅ Add support for multimedia content retrieval (images, videos).
- ✅ Improve citation handling with multiple source references.
- ✅ Enhance multi-turn conversations.
- ✅ Expand language support for global users.
Pull requests are welcome! Feel free to fork the repo and submit your contributions.
This project is licensed under the MIT License. See the LICENSE file for details.