Skip to content

An Agentic RAG (Retrieval-Augmented Generation) system powered by LangChain, enabling multi-step reasoning over documents using LLMs, ChromaDB, and Google Drive as a document source.

Notifications You must be signed in to change notification settings

Prabal-verma/Agentic-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Agentic RAG System

Alt Text

A comprehensive Retrieval-Augmented Generation (RAG) system that combines voice input, multimodal document processing, and intelligent search capabilities across multiple sources. Built with FastAPI and Next.js, this system provides real-time AI-powered question answering with visual grounding and citation support.

🌟 Features

🎀 Voice Integration

  • Real-time Speech-to-Text: Streaming voice input using Google Cloud Speech-to-Text
  • WebSocket-based Audio Processing: Low-latency voice recognition with partial results
  • Multi-language Support: Configurable language detection and transcription
  • Voice-enabled Chat Interface: Natural conversation flow with voice commands

πŸ“š Multimodal Document Processing

  • Advanced PDF Processing: Extract text, images, charts, and tables from PDFs
  • Image Understanding: AI-powered analysis of charts, diagrams, and visual content
  • OCR Integration: Text extraction from scanned documents and images
  • Smart Chunking: Intelligent text segmentation with context preservation
  • Visual Grounding: Link answers to specific document images and pages

πŸ” Intelligent Multi-Source Search

  • Local RAG System: Vector-based document retrieval with ChromaDB
  • Web Search Integration: Real-time web search via SERP API
  • Google Drive MCP: Model Context Protocol integration for Drive documents
  • Parallel Search Execution: Simultaneous queries across all sources
  • Smart Result Fusion: Intelligent combination of results from multiple sources

πŸ“ Citations & Transparency

  • Comprehensive Citations: Detailed source attribution for every answer
  • Visual Citations: Click-through access to source images and documents
  • Confidence Scoring: Reliability indicators for each source
  • Source Traceability: Full audit trail of information sources
  • Interactive Content Viewer: In-app display of PDFs, images, and web content

⚑ Real-time Capabilities

  • WebSocket Communication: Real-time chat and voice processing
  • Streaming Responses: Progressive answer generation
  • Live Transcription: Real-time speech-to-text with partial results
  • Concurrent Processing: Parallel execution of search and generation tasks

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚    Backend       β”‚    β”‚   External      β”‚
β”‚   (Next.js)     β”‚    β”‚   (FastAPI)      β”‚    β”‚   Services      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Voice Input   │◄──►│ β€’ STT Service    β”‚    β”‚ β€’ Gemini/Claude β”‚
β”‚ β€’ Chat UI       β”‚    β”‚ β€’ RAG Engine     │◄──►│ β€’ Google Drive  β”‚
β”‚ β€’ Citations     β”‚    β”‚ β€’ Web Search     β”‚    β”‚ β€’ SERP API      β”‚
β”‚ β€’ Image Display β”‚    β”‚ β€’ Document Proc. β”‚    β”‚ β€’ ChromaDB      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technology Stack

Backend

  • Framework: FastAPI 0.104+ with async/await support
  • AI Providers: Google Gemini 1.5 Pro, Anthropic Claude 3 Sonnet
  • Vector Database: ChromaDB for embedding storage and retrieval
  • Speech Processing: Google Cloud Speech-to-Text API
  • Document Processing: PyPDF2, Pillow, pytesseract for OCR
  • Search Integration: SERP API for web search, Google Drive API
  • WebSocket: Real-time communication with connection management
  • Authentication: OAuth 2.0 for Google services

Frontend

  • Framework: Next.js 14 with App Router
  • Language: TypeScript for type safety
  • UI Library: React 18 with Tailwind CSS
  • State Management: Zustand for client state
  • Audio Processing: Web Audio API with WebRTC
  • Real-time: WebSocket client with auto-reconnection
  • Testing: Jest and React Testing Library

AI & ML

  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • Vector Search: Similarity search with configurable thresholds
  • Multimodal AI: Vision models for image understanding
  • Text Generation: Context-aware response generation
  • Confidence Scoring: Relevance and reliability metrics

πŸš€ Installation & Setup

Prerequisites

  • Python 3.9+ (3.11 recommended)
  • Node.js 18+ with npm/yarn
  • Google Cloud Account (for Speech-to-Text)
  • AI Provider Account (Gemini or Claude)

1. Clone Repository

git clone <repository-url>
cd agnt

2. Backend Setup

Install Dependencies

cd backend
pip install -r requirements.txt

Configure Environment

cp env.example .env
# Edit .env with your configuration (see Configuration section)

Set up Google Cloud Service Account

  1. Create a project in Google Cloud Console
  2. Enable Speech-to-Text API and Drive API
  3. Create a service account and download the JSON key
  4. Set GOOGLE_CLOUD_SERVICE_ACCOUNT_PATH in your .env file

Initialize Database

# ChromaDB will be initialized automatically on first run
# Data will be stored in ./chroma_db/ directory

3. Frontend Setup

Install Dependencies

cd frontend
npm install
# or
yarn install

Configure Environment

# Create .env.local file
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
echo "NEXT_PUBLIC_WS_URL=ws://localhost:8000" >> .env.local

4. Running the Application

Start Backend Server

cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Start Frontend Development Server

cd frontend
npm run dev
# or
yarn dev

Access the Application

βš™οΈ Configuration

Environment Variables

Core AI Configuration

# Choose your AI provider
AI_PROVIDER=claude  # or "gemini"

# API Keys (get one based on your provider choice)
CLAUDE_API_KEY=your_claude_api_key
GEMINI_API_KEY=your_gemini_api_key

Speech-to-Text Setup

# Google Cloud Speech-to-Text (required for voice features)
GOOGLE_CLOUD_SERVICE_ACCOUNT_PATH=/path/to/service-account.json
STT_PROVIDER=google
GOOGLE_SPEECH_MODEL=latest_long

Search Integration (Optional)

# Web Search (choose one)
SERP_API_KEY=your_serp_api_key          # Recommended
GOOGLE_API_KEY=your_google_api_key      # Alternative

# Google Drive Integration
GOOGLE_DRIVE_CLIENT_ID=your_client_id
GOOGLE_DRIVE_CLIENT_SECRET=your_client_secret

Advanced Settings

# Vector Database
CHROMA_PERSIST_DIRECTORY=./chroma_db
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

# Performance Tuning
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.7
MAX_TOKENS_PER_CHUNK=1000
CHUNK_OVERLAP=200
MAX_CONCURRENT_REQUESTS=100

# Security
CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
SECRET_KEY=your-secret-key-change-in-production
RATE_LIMIT_PER_MINUTE=60

# Feature Flags
ENABLE_WEB_SEARCH=true
ENABLE_GOOGLE_DRIVE=true
ENABLE_VOICE_INPUT=true
ENABLE_IMAGE_ANALYSIS=true

Model Configuration

AI Provider Selection

  • Claude: Better reasoning, more conservative responses
  • Gemini: Faster processing, better multimodal understanding

Model Choices

# Gemini Models
GEMINI_CHAT_MODEL=gemini-1.5-pro
GEMINI_VISION_MODEL=gemini-1.5-pro-vision

# Claude Models  
CLAUDE_CHAT_MODEL=claude-3-sonnet-20240229
CLAUDE_VISION_MODEL=claude-3-sonnet-20240229

πŸ“– API Documentation

Core Endpoints

Health Check

GET /health

Returns system status and service availability.

Document Upload

POST /upload
Content-Type: multipart/form-data

file: <PDF file>

Upload and process a PDF document with image extraction.

Response:

{
  "success": true,
  "document_id": "uuid",
  "filename": "document.pdf",
  "pages_processed": 10,
  "images_extracted": 5,
  "text_chunks": 25,
  "processing_time_ms": 1500
}

Query System

POST /query
Content-Type: application/json

{
  "query": "What is the main conclusion of the research?",
  "num_results": 5,
  "include_web_search": true,
  "include_drive_search": true
}

Response:

{
  "answer": "Based on the research findings...",
  "citations": [
    {
      "id": "cite_1",
      "source_type": "document",
      "citation_type": "text",
      "title": "Research Paper.pdf",
      "content": "The main conclusion shows...",
      "page_number": 15,
      "confidence_score": 0.95
    }
  ],
  "confidence_score": 0.87,
  "processing_time_ms": 2300
}

Citation Details

GET /citation/{citation_id}

Retrieve full content and metadata for a specific citation.

WebSocket Endpoints

Speech-to-Text

const ws = new WebSocket('ws://localhost:8000/ws/stt');

// Send audio data
ws.send(audioBuffer);

// Receive transcription
ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  console.log(result.text, result.confidence);
};

Real-time Chat

const ws = new WebSocket('ws://localhost:8000/ws/chat');

// Send message
ws.send(JSON.stringify({
  type: 'query',
  message: 'Hello, how can you help me?',
  session_id: 'session_123'
}));

Frontend API Client

The frontend includes a comprehensive API client in src/lib/api.ts:

import { QueryRequest, QueryResponse } from '@/types/api';

// Query the system
const response = await api.query({
  query: 'What is machine learning?',
  num_results: 5
});

// Upload document
const result = await api.uploadDocument(file);

// Get citation details
const citation = await api.getCitation(citationId);

🎯 Usage Examples

Basic Text Query

  1. Open the application at http://localhost:3000
  2. Type your question in the chat input
  3. View the AI-generated response with citations
  4. Click citations to view source content

Voice Query

  1. Click the microphone icon in the chat interface
  2. Speak your question clearly
  3. Watch real-time transcription appear
  4. Release to send the query
  5. Receive voice-enabled response

Document Upload & Analysis

  1. Click the upload button or drag files into the interface
  2. Select a PDF document (with images/charts)
  3. Wait for processing to complete
  4. Ask questions about the document content
  5. View responses with page-specific citations

Advanced Search Features

# Query with specific filters
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly revenue trends",
    "num_results": 10,
    "include_web_search": true,
    "filters": {
      "date_range": "2023-2024",
      "document_type": "financial"
    }
  }'

πŸ§ͺ Development

Running Tests

Backend Tests

cd backend
pytest -v
pytest --cov=app tests/  # With coverage

Frontend Tests

cd frontend
npm test
npm run test:watch  # Watch mode

Code Quality

Linting & Formatting

# Backend
cd backend
black .
flake8 .

# Frontend
cd frontend
npm run lint
npm run type-check

Development Workflow

  1. Feature Development

    • Create feature branch from main
    • Add tests for new functionality
    • Update documentation as needed
  2. Testing

    • Run full test suite
    • Test with different AI providers
    • Verify WebSocket functionality
  3. Code Review

    • Check API compatibility
    • Verify error handling
    • Test edge cases

Project Structure

agnt/
β”œβ”€β”€ backend/                 # FastAPI backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ config.py       # Configuration management
β”‚   β”‚   β”œβ”€β”€ models/         # Pydantic schemas
β”‚   β”‚   β”œβ”€β”€ services/       # Business logic
β”‚   β”‚   └── websocket/      # WebSocket handlers
β”‚   β”œβ”€β”€ main.py            # FastAPI application
β”‚   └── requirements.txt   # Python dependencies
β”œβ”€β”€ frontend/              # Next.js frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/          # App router pages
β”‚   β”‚   β”œβ”€β”€ components/   # React components
β”‚   β”‚   β”œβ”€β”€ lib/         # Utilities and API client
β”‚   β”‚   └── store/       # State management
β”‚   └── package.json     # Node.js dependencies
└── README.md           # This file

πŸ› Troubleshooting

Common Issues

1. Speech-to-Text Not Working

# Check Google Cloud credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Verify API is enabled
gcloud services list --enabled | grep speech

# Test authentication
python -c "from google.cloud import speech; print('Auth OK')"

2. Vector Database Issues

# Reset ChromaDB
rm -rf backend/chroma_db/
# Restart backend to reinitialize

3. CORS Errors

# Update CORS_ORIGINS in .env
CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

4. WebSocket Connection Failed

# Check firewall settings
# Verify WebSocket URL in frontend config
# Check backend logs for connection errors

Performance Optimization

1. Slow Document Processing

  • Reduce image resolution in processing
  • Increase MAX_CONCURRENT_REQUESTS
  • Use SSD storage for database

2. High Memory Usage

  • Adjust EMBEDDING_BATCH_SIZE
  • Limit MAX_TOKENS_PER_CHUNK
  • Monitor vector database size

3. API Response Times

  • Enable caching with Redis
  • Optimize similarity threshold
  • Use parallel search execution

Debugging

Enable Debug Logging

# Backend
LOG_LEVEL=DEBUG

# Frontend  
NEXT_PUBLIC_DEBUG=true

Monitor System Health

# Check service status
curl http://localhost:8000/health

# View logs
tail -f backend/app.log

πŸš€ Deployment

Production Deployment

Docker Deployment

# Build and run with Docker
docker-compose up --build -d

Environment Setup

# Production environment variables
DEBUG=false
RELOAD=false
LOG_LEVEL=INFO
CORS_ORIGINS=https://yourdomain.com

Performance Considerations

  • Use PostgreSQL for metadata storage
  • Implement Redis for caching
  • Set up load balancing for multiple instances
  • Configure CDN for static assets

Security Checklist

  • Change default SECRET_KEY
  • Enable HTTPS in production
  • Implement rate limiting
  • Secure API endpoints
  • Validate file uploads
  • Monitor for suspicious activity

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Install development dependencies
  4. Run tests to ensure everything works
  5. Make your changes
  6. Add tests for new functionality
  7. Submit a pull request

Code Standards

  • Python: Follow PEP 8, use type hints
  • TypeScript: Use strict mode, proper interfaces
  • Documentation: Update README for new features
  • Testing: Maintain test coverage above 80%

Reporting Issues

  1. Check existing issues first
  2. Provide detailed reproduction steps
  3. Include system information
  4. Add relevant logs and error messages

πŸ“œ License

MIT License - see LICENSE file for details.

πŸ™‹β€β™‚οΈ Support

Getting Help

  • Documentation: Check this README and API docs
  • Issues: Create GitHub issue for bugs
  • Discussions: Use GitHub Discussions for questions

About

An Agentic RAG (Retrieval-Augmented Generation) system powered by LangChain, enabling multi-step reasoning over documents using LLMs, ChromaDB, and Google Drive as a document source.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published