Document RAG Assistant Suite

A suite of Streamlit applications that enable conversational interactions with document collections using Reranking Retrieval-Augmented Generation (RAG).

🌟 Overview

This project consists of two complementary Streamlit applications:

RAG_Bot - Upload and process new documents to create searchable knowledge bases.
Doc_Bot - Connect to existing document collections and query them conversationally.

Both applications utilize MongoDB Atlas Vector Search for document storage and retrieval, Cohere for embeddings, and Groq LLM for generating high-quality responses in French.

🚀 Features

RAG_Bot

Upload various document types (PDF, TXT, CSV, DOCX)
Automatic document processing and chunking
Vector embedding generation with Cohere
MongoDB Atlas Vector Search integration
Real-time chat interface with the processed document
Source citation and reference

Doc_Bot

Connect to existing document collections in MongoDB
Browse available knowledge bases
Chat-based querying of document collections
Source citation and reference
Seamless integration with collections created by RAG_Bot

⚙️ Technical Stack

Frontend: Streamlit
Vector Database: MongoDB Atlas Vector Search
Embeddings: Cohere (embed-multilingual-v3.0)
LLM: Groq (meta-llama/llama-4-scout-17b-16e-instruct)
RAG Framework: LangChain + LangGraph + Re-Ranking RAG
Document Processing: LangChain document loaders

📋 Requirements

Python 3.9+
Streamlit
MongoDB Atlas account
Cohere API key
Groq API key

🔧 Installation

Clone the repository:

git clone https://github.com/yourusername/doc-rag-assistant.git
cd doc-rag-assistant

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Set up your .streamlit/secrets.toml file with your API keys:

MDB_URI_CONNECTION = "mongodb+srv://username:password@your-cluster.mongodb.net/?retryWrites=true&w=majority"
COHERE_API_KEY = "your-cohere-api-key"
GROQ_API_KEY = "your-groq-api-key"

🚀 Usage

Running RAG_Bot

streamlit run rag_bot.py

Upload a document using the file uploader in the sidebar
Click "Process Document" to create a new collection in MongoDB
Chat with your document using the chat interface
Note the collection name for future reference with Doc_Bot

Running Doc_Bot

streamlit run doc_bot.py

Select an existing collection from the dropdown in the sidebar
Click "Se connecter à la collection" to establish the connection
Chat with the selected document collection using the chat interface

🌐 Workflow

The typical workflow involves:

Using RAG_Bot to upload and process new documents, creating collections in MongoDB
Using Doc_Bot to access and query those collections later

This separation allows for a clean division between document processing and querying functionality.

🔍 How It Works

Document Processing:
- Documents are loaded using LangChain document loaders
- Text is split into manageable chunks with RecursiveCharacterTextSplitter
- Chunks are embedded using Cohere's multilingual embedding model
- Embeddings are stored in MongoDB Atlas with vector search capabilities
Retrieval Process:
- User questions are embedded using the same Cohere model
- MongoDB Atlas Vector Search finds semantically similar document chunks
- Retrieved chunks provide context for the LLM
Response Generation:
- LangGraph orchestrates the retrieval and generation flow
- Groq LLM (LLama 4) generates responses based on retrieved context
- Responses are provided in French with proper source citations

📝 Notes

All responses are generated in French as specified in the prompt template
Document collections are stored in the "IT_OPS" database in MongoDB Atlas
Collections created by RAG_Bot are prefixed with "doc_" and include a timestamp
Source references are provided with each response

⚠️ Requirements File

Create a requirements.txt file with the following dependencies:

streamlit
pymongo
langchain
langchain-core
langchain-mongodb
langchain-cohere
langchain-groq
langchain-community
langgraph
python-dotenv

🛠️ Configuration

Ensure your MongoDB Atlas cluster has:

Vector search capability enabled
Proper network access settings
A user with read/write permissions

👥 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
arxiv		arxiv
README.md		README.md
doc_bot.py		doc_bot.py
rag_bot.py		rag_bot.py
sc_rag_0.png		sc_rag_0.png
sc_rag_1.png		sc_rag_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document RAG Assistant Suite

🌟 Overview

🚀 Features

RAG_Bot

Doc_Bot

⚙️ Technical Stack

📋 Requirements

🔧 Installation

🚀 Usage

Running RAG_Bot

Running Doc_Bot

🌐 Workflow

🔍 How It Works

📝 Notes

⚠️ Requirements File

🛠️ Configuration

👥 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

0xZee/RAG_Apps

Folders and files

Latest commit

History

Repository files navigation

Document RAG Assistant Suite

🌟 Overview

🚀 Features

RAG_Bot

Doc_Bot

⚙️ Technical Stack

📋 Requirements

🔧 Installation

🚀 Usage

Running RAG_Bot

Running Doc_Bot

🌐 Workflow

🔍 How It Works

📝 Notes

⚠️ Requirements File

🛠️ Configuration

👥 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages