RagChitChat: Retrieval Augmented Chat for PPT/PDF Notes

A powerful Retrieval Augmented Generation (RAG) chatbot designed specifically for .ppt .pdf notes. It combines local LLM capabilities with advanced document retrieval to provide accurate, context-aware answers about software engineering concepts.

demo : youtube DEMO

## 🌟 Features

📚 Document Processing: Intelligent handling of various file formats
- PDF lecture notes with multi-page support
- PowerPoint (PPTX) presentations
- Automatic chunking with overlap for context preservation
- Text extraction with metadata retention (source, page numbers)
🔍 Advanced Retrieval:
- Hybrid search combining semantic (vector) and keyword-based (BM25) retrieval
- ChromaDB vector database for efficient similarity search
- Custom relevance scoring and result ranking
- Configurable retrieval parameters (chunk size, overlap, top_k)
🧠 Local LLM Integration:
- Runs entirely offline with Ollama
- Multiple model support with live switching (mistral, llama2, etc.)
- Optimized prompting with context window management
- Stream responses for better user experience
📊 Advanced Prompt Engineering:
- Chain-of-thought reasoning
- Few-shot learning examples for complex queries
- Self-reflective answer generation
- Structured output formatting based on query type
- Dynamic prompt selection based on question analysis
💻 Rich Terminal UI:
- Beautiful interactive interface with Rich
- Markdown rendering for formatted responses
- Progress indicators during processing
- Command system with help, history, and model management
- Syntax highlighting and pretty printing

📋 Project Structure(OOP Based)

RagChitChat/
├── README.md                # Project documentation
├── requirements.txt         # Dependencies
├── setup.py                 # Package setup for imports
├── .env.example             # Environment variables template
├── .gitignore               # Git ignore rules
│
├── data/                    # Raw lecture notes (.pdf, .pptx)
├── processed/               # Processed text data
├── chroma_db/               # Vector database storage
│
├── src/                     # Source code
│   ├── __init__.py
│   ├── document_processor/  # PDF and PPTX processing
│   │   ├── __init__.py
│   │   └── processor.py     # Document processing classes
│   │
│   ├── vector_store/        # ChromaDB integration
│   │   ├── __init__.py
│   │   └── chroma_store.py  # Vector database management
│   │
│   ├── retriever/           # Haystack retrieval components
│   │   ├── __init__.py
│   │   └── haystack_retriever.py  # Document retrieval
│   │
│   ├── llm/                 # Ollama integration
│   │   ├── __init__.py
│   │   └── ollama_client.py # LLM interface
│   │
│   ├── prompts/             # Prompt engineering
│   │   ├── __init__.py
│   │   └── prompt_templates.py  # Advanced prompt templates
│   │
│   ├── interface/           # Rich terminal UI
│   │   ├── __init__.py
│   │   └── terminal_ui.py   # Terminal interface
│   │
│   └── main.py              # Entry point
│
└── config/                  # Configuration files
    ├── __init__.py
    └── settings.py          # Application settings

🛠️ Requirements

System Requirements

Python 3.8 or higher
8GB RAM minimum (16GB recommended for larger models)
Storage space for models and vector database (2GB+)
Windows, macOS, or Linux

Required Tools

Ollama: For running local LLMs (Download)
Python: 3.8+ (Download)

🚀 Complete Setup Instructions

1. Install Python and Setup Virtual Environment

# Install Python (if not already installed)
# Download from https://www.python.org/downloads/

# Clone the repository (if using git)
git clone https://github.com/nxdun/RagChitChat.git
cd RagChitChat

# Create and activate virtual environment
python -m venv env

# On Windows
.\env\Scripts\activate

# On macOS/Linux
source env/bin/activate

2. Install Dependencies

# Install required packages
pip install -r requirements.txt

# (optional)Install package for imports (optional, if import issues occur)
pip install -e .

3. Install and Configure Ollama

# Download and install Ollama from https://ollama.ai/

# Pull required models (after installing)
ollama pull mistral:7b-instruct-v0.3-q4_1  # Default model(You can edit this on env)
ollama pull llama2                          # Alternative model (optional)

4. Prepare Your Data

# Create data directory (root Dir)
mkdir -p data

# Copy your lecture notes into data folder
# Supported formats: PDF, PPTX
# Example: cp ~/Downloads/CTSE_Lecture*.pdf data/

5. Configure Environment Variables (Optional : Or Use Defaults)

# Copy example environment file
cp .env.example .env

# Edit .env file with your settings
# nano .env or use any text editor

6. Run the Application

# Start the chatbot
python src/main.py

📝 Configuration Options

You can configure the application through environment variables or by editing config/settings.py:

Variable	Description	Default
`RAGCHITCHAT_DATA_DIR`	Directory containing lecture notes	`data`
`RAGCHITCHAT_PROCESSED_DIR`	Directory for processed documents	`processed`
`RAGCHITCHAT_DB_DIR`	Directory for vector database	`chroma_db`
`RAGCHITCHAT_MODEL`	Default Ollama model	`mistral:7b-instruct-v0.3-q4_1`
`OLLAMA_URL`	Ollama API URL	`http://localhost:11434`
`RAGCHITCHAT_CHUNK_SIZE`	Document chunk size	`1000`
`RAGCHITCHAT_CHUNK_OVERLAP`	Overlap between chunks	`200`
`RAGCHITCHAT_TOP_K`	Number of context documents to retrieve	`5`

🖥️ Usage Guide

Basic Commands

Once the application is running, you can interact with it through the terminal:

Command	Description
`/help`	Show available commands
`/exit`	Exit the application
`/clear`	Clear conversation history
`/history`	Show conversation history
`/models`	List available models
`/model <name>`	Switch to a different model
`/info`	Show system information
`/about`	About the application

Example Questions

Here are some examples of questions you can ask:

"What is continuous integration?"
"Explain the difference between DevOps and DevSecOps"
"What are the benefits of microservices architecture?"
"How does containerization improve software deployment?"
"Compare agile and waterfall methodologies in software engineering"

📊 Advanced Features

Model Switching

You can switch between different LLMs during runtime:

/models                         # List available models
/model mistral:7b-instruct-v0.3-q4_1   # Switch to Mistral
/model llama2                   # Switch to Llama 2

Dynamic Prompt Selection

The system automatically selects prompting strategies based on question type:

Factual questions: Standard RAG with direct answers
Comparative questions: Structured comparison format
Procedural questions: Step-by-step instructions
Complex questions: Self-reflective generation

Hybrid Search

The retrieval system combines two search methods for better results:

Vector search: Semantic similarity using embeddings
BM25 search: Keyword-based traditional search

🔧 Troubleshooting

Common Issues and Solutions

Problem: Import errors when running python src/main.py Solution: Install the package in development mode:

pip install -e .

Problem: "Cannot connect to Ollama" error Solution: Ensure Ollama is installed and running:

# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not running, start Ollama application

Problem: Model not found when switching models Solution: Pull the model using Ollama CLI:

ollama pull <model_name>

Problem: High memory usage Solution: Use a smaller model or reduce RAGCHITCHAT_TOP_K in settings

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Ollama for local LLM capabilities
Haystack for the RAG framework
ChromaDB for vector storage
Rich for the terminal UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RagChitChat: Retrieval Augmented Chat for PPT/PDF Notes

📋 Project Structure(OOP Based)

🛠️ Requirements

System Requirements

Required Tools

🚀 Complete Setup Instructions

1. Install Python and Setup Virtual Environment

2. Install Dependencies

3. Install and Configure Ollama

4. Prepare Your Data

5. Configure Environment Variables (Optional : Or Use Defaults)

6. Run the Application

📝 Configuration Options

🖥️ Usage Guide

Basic Commands

Example Questions

📊 Advanced Features

Model Switching

Dynamic Prompt Selection

Hybrid Search

🔧 Troubleshooting

Common Issues and Solutions

🤝 Contributing

📄 License

Acknowledgements

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
notebook		notebook
screenshots		screenshots
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
logo.png		logo.png
requirements.txt		requirements.txt
setup.py		setup.py

nxdun/RagChitChat

Folders and files

Latest commit

History

Repository files navigation

RagChitChat: Retrieval Augmented Chat for PPT/PDF Notes

📋 Project Structure(OOP Based)

🛠️ Requirements

System Requirements

Required Tools

🚀 Complete Setup Instructions

1. Install Python and Setup Virtual Environment

2. Install Dependencies

3. Install and Configure Ollama

4. Prepare Your Data

5. Configure Environment Variables (Optional : Or Use Defaults)

6. Run the Application

📝 Configuration Options

🖥️ Usage Guide

Basic Commands

Example Questions

📊 Advanced Features

Model Switching

Dynamic Prompt Selection

Hybrid Search

🔧 Troubleshooting

Common Issues and Solutions

🤝 Contributing

📄 License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages