SocioRAG

📈 Project Status

Current Version: v1.0.3 | Status: ✅ Production Ready | Last Updated: June 23, 2025

✅ Production Ready Features

🎯 Zero Error Rate: All tests passing with robust error handling
⚡ High Performance: Sub-millisecond response times with optimized concurrency
📚 Complete Documentation: All guides consolidated and up-to-date (June 2025)
🔧 Full Feature Set: Entity extraction, vector search, multilingual support, PDF export, and analytics
🚀 Auto-Install: Smart dependency detection and installation
🛡️ Production Hardened: Comprehensive logging, monitoring, and health checks

🔑 Environment Configuration

Variable	Description	Example Value	Required
OPENROUTER_API_KEY	OpenRouter API key for LLM access	`sk-or-v1-***`	✅
CHUNK_SIM	Similarity threshold for chunking	`0.80`	⚠️
LOG_LEVEL	Application logging level	`DEBUG`	⚠️

Setup: Copy .env.example to .env and update with your values.

Copy-Item .env.example .env
# Edit .env with your API keys

📋 Overview

SocioRAG is a production-ready system for analyzing social dynamics in texts through advanced NLP, entity extraction, vector search, and answer generation capabilities. The system follows a modular architecture with distinct phases for data ingestion, storage, retrieval, and answer generation.

🛡️ System Requirements

Python: 3.12+ (tested with 3.12.9)
Node.js: 18+ with npm/pnpm/yarn support
Operating System: Windows (PowerShell), Linux, macOS
Memory: 4GB+ RAM (8GB recommended for optimal performance)
Storage: 2GB+ free space for dependencies and models
Internet: Required for model downloads and API access

🔄 Smart Installation Features

🔍 Auto-Detection: Automatically detects missing dependencies and installs them
📦 Multi-Package Manager: Supports npm, pnpm, and yarn (auto-detected)
🪟 Windows Optimized: Proper handling of paths with spaces (e.g., "Program Files")
🛡️ Error Recovery: Clear error messages with automatic fallbacks
⚡ Zero Configuration: Works out-of-the-box after environment setup

📖 Complete deployment guide: DEPLOYMENT.md

🚀 Quick Start

🎯 TL;DR: Run .\start.ps1 → Click http://localhost:3000 when ready → Run .\stop.ps1 when done

⚡ One-Command Startup (Recommended)

# Clone and navigate to repository
git clone https://github.com/Muhanad-husn/sociorag.git
cd sociorag

# Setup environment files
Copy-Item .env.example .env
Copy-Item config.yaml.example config.yaml
# Edit .env with your OPENROUTER_API_KEY

# Start everything automatically
.\start.ps1

🎉 Success indicators:

Backend: ✅ Backend started successfully
Frontend: ✅ Frontend started successfully
Health: ✅ All services started successfully!

🌐 Access Points:

Main Application: http://localhost:3000 ← Primary UI
API Documentation: http://localhost:8000/docs
Backend Health: http://localhost:8000/api/admin/health
Admin Status: http://localhost:8000/api/admin/status

🔧 Complete Setup (First Time)

For comprehensive environment setup with database initialization:

# Run full setup script
.\setup.ps1

# Then start normally
.\start.ps1

� Shutdown (Important)

Always properly stop the application when finished:

.\stop.ps1

💡 Why this matters: Prevents port conflicts, ensures clean shutdown, and stops all background processes properly.

✨ Feature Overview

Feature	Description	Status
🧠 Entity Extraction	LLM-powered multilingual entity recognition with spaCy	✅ Ready
🔍 Vector Search	Fast similarity search with reranking and configurable params	✅ Ready
📄 PDF Export	Custom-styled automated report generation	✅ Ready
📊 Query Analytics	JSONL logging with performance metrics	✅ Ready
🌐 Multilingual	English & Arabic support with translation API	✅ Ready
🎨 Modern UI	Responsive design with dark/light themes	✅ Ready
🔐 Security	API key management and secure configurations	✅ Ready
📈 Monitoring	Health checks, structured logging, performance dashboards	✅ Ready
🚀 Auto-Deploy	One-command startup with dependency management	✅ Ready

🧪 Testing & Quality

# Run all tests
pytest tests/ -v

# Integration tests only
pytest -m integration -v

# Performance testing
.\scripts\testing\test_runner.ps1 -TestLevel standard

# Load testing
.\scripts\testing\load_test.ps1 -ConcurrentUsers 5

Test Coverage: 100% pass rate | Performance: Sub-millisecond response | Documentation: Complete

📖 Full testing guide: tests/README.md

🏗️ Architecture & How It Works

SocioRAG follows a robust 4-phase pipeline:

📥 Ingest: Upload documents (PDF, text) with intelligent preprocessing
🎯 Extract: Entities and relationships via LLM + spaCy with multilingual support
🗄️ Store: Dual vector storage system:
- Chunk Embeddings: Document segments for semantic retrieval
- Entity Embeddings: Named entities for graph analysis and entity-level search
- Semantic Chunking: AI-driven text segmentation based on semantic boundaries
🔍 Query: Advanced hybrid retrieval combining:
- Vector similarity search
- BM25 keyword matching
- Cross-encoder reranking
- Source diversity enforcement
📤 Export: Download answers and comprehensive reports as styled PDFs

� Technology Stack

Component	Technology	Purpose
🖥️ Backend	FastAPI + Python 3.12	API server and core logic
🎨 Frontend	Preact + Vite + TypeScript	Modern reactive UI
🧠 LLM	OpenRouter API + LangChain	Language model integration
🗄️ Vector DB	SQLite-vec	Embeddings and similarity search
📊 Graph DB	SQLite	Entity relationships
🔍 NLP	spaCy + Custom pipeline	Entity extraction and analysis
📄 Export	Playwright	PDF generation with styling
🔒 Config	Pydantic + YAML	Type-safe configuration

📁 Project Structure

sociorag/
├── 🖥️ backend/app/          # FastAPI application
│   ├── api/                 # REST API endpoints
│   ├── core/                # Configuration & logging
│   ├── ingest/              # Document processing
│   ├── retriever/           # Vector search & retrieval
│   └── answer/              # Response generation
├── 🎨 ui/                   # Preact frontend
├── 📊 scripts/              # Automation & testing
│   ├── production/          # Deployment scripts
│   ├── testing/             # Test automation
│   └── utilities/           # Helper tools
├── 📚 docs/                 # Documentation
├── 🧪 tests/                # Test suites
└── 📦 Configuration files

📚 Documentation Hub

🚀 Quick Start Guide – Get up and running in minutes
📖 Complete Documentation – Centralized access to all guides
🔧 API Reference – Complete endpoint documentation
🏭 Production Deployment – Deployment & scaling
📊 Project Status – System health & version info
🛠️ Architecture Guide – System design deep-dive

🖥️ User Interface Features

🔍 Smart Search: Natural language, semantic, and multilingual queries
📜 Query History: View, copy, and delete previous queries with timestamps
📤 Document Management: Upload, process, and download documents
⚙️ Advanced Settings: API keys, model selection, theme toggle
📊 Performance Metrics: Real-time analytics and response monitoring
🌐 Multilingual: Full English and Arabic support with auto-translation
🎨 Modern Design: Responsive UI with dark/light themes

📈 Monitoring & Performance

Health Checks: Real-time system status monitoring
Performance Dashboard: Response times, throughput, and system metrics
Load Testing: Built-in testing scripts for performance validation
Structured Logging: Comprehensive logging with multiple output formats

# Start monitoring dashboard
.\scripts\testing\monitoring_dashboard.ps1

# Run performance tests
.\scripts\testing\performance_test_monitor.ps1

# Check system status
.\scripts\utilities\production_status.ps1

💻 Installation Options

🎯 Option 1: Automated Setup (Recommended)

Prerequisites: Python 3.12+, Node.js 18+, Git

# Clone repository
git clone https://github.com/Muhanad-husn/sociorag.git
cd sociorag

# Automated setup - handles everything
.\setup.ps1

# Start application
.\start.ps1

🐍 Option 2: Python Environment Setup

Using Conda (Recommended)

# Install Miniconda if not already installed
# Download: https://docs.conda.io/en/latest/miniconda.html

# Create environment from environment.yml (if available)
conda env create -f environment.yml
conda activate sociorag

# Or create manually
conda create -n sociorag python=3.12
conda activate sociorag
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en_core_web_sm

# Install Playwright browsers
playwright install

Using pip + venv

# Create virtual environment
python -m venv .venv

# Activate environment
# Windows
.\.venv\Scripts\Activate.ps1
# Linux/macOS
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en_core_web_sm

# Install Playwright browsers
playwright install

� Configuration Setup

# 1. Copy configuration templates
Copy-Item config.yaml.example config.yaml
Copy-Item .env.example .env

# 2. Edit configuration files
# config.yaml - System settings
# .env - API keys and environment variables

Required Configuration:

OPENROUTER_API_KEY: Get from OpenRouter
CHUNK_SIM: Text similarity threshold (default: 0.8)
LOG_LEVEL: Logging verbosity (DEBUG/INFO/WARNING/ERROR)

🏃 Running the Application

# Start with auto-dependency installation
.\start.ps1

# Or start services manually
# Terminal 1: Backend
python -m backend.app.main

# Terminal 2: Frontend  
cd ui
npm install  # if not already installed
npm run dev

� System Requirements

Minimum Requirements

OS: Windows 10+, macOS 10.15+, Linux (Ubuntu 18.04+)
Python: 3.12+ (recommended 3.12.9)
Node.js: 18+ with npm/pnpm/yarn
Memory: 4GB RAM
Storage: 2GB free space
Network: Internet connection for model downloads

Recommended Requirements

Memory: 8GB+ RAM for optimal performance
Storage: 10GB+ for large document processing
CPU: Multi-core processor for concurrent operations
GPU: Optional, future feature for accelerated embeddings

🏗️ Architecture

Detailed diagrams & component docs → docs/architecture_documentation.md

Core Components

Data Ingestion Pipeline (backend/app/ingest/)
- Enhanced entity extraction with LLM-powered analysis
- Document chunking and metadata extraction
Vector Storage & Retrieval (backend/app/retriever/)
- Chunk Embeddings: Stores document segments as vectors for semantic search (e.g., SQLite-vec)
- Entity Embeddings: Stores named entities as separate vectors for entity-level similarity and graph operations
- Advanced Retrieval System: Employs hybrid retrieval combining vector similarity, BM25 keyword matching, and cross-encoder reranking for comprehensive coverage
- Intelligent Context Management: Dynamically balances relevance, diversity, and token limits with priority-based selection and semantic deduplication
- Enables both chunk-based and entity-based retrieval with source diversity enforcement
Answer Generation (backend/app/answer/)
- Complete response generation with LLM integration
- Citation management and source linking
Core Infrastructure (backend/app/core/)
- Centralized configuration management
- Logging and error handling
API Layer (backend/app/api/)
- FastAPI endpoints for Q&A functionality
- RESTful interface design

Technology Stack

Framework: FastAPI with async/await support
LLM Integration: LangChain with OpenRouter API
Vector Database: SQLite-vec for similarity search (chunks and entities)
Graph Database: SQLite for entity relationships
Entity Extraction: spaCy + Custom LLM pipeline
PDF Generation: Playwright with browser automation

🤝 Contributing & Development

We welcome contributions! Here's how to get started:

🛠️ Development Setup

# Fork and clone the repository
git clone https://github.com/your-username/sociorag.git
cd sociorag

# Set up development environment
.\setup.ps1

# Run tests
pytest tests/ -v

# Start in development mode
.\start.ps1 -ShowStartupLogs

🧪 Testing Guidelines

Unit Tests: pytest tests/ -v
Integration Tests: pytest -m integration -v
Performance Tests: .\scripts\testing\test_runner.ps1
Load Tests: .\scripts\testing\load_test.ps1

📝 Documentation Standards

Update README.md for user-facing changes
Add docstrings for new functions/classes
Update API documentation for endpoint changes
Include type hints for Python code

🔄 Development Workflow

Create a feature branch: git checkout -b feature/your-feature
Make your changes with tests
Run the full test suite
Update documentation as needed
Submit a pull request with clear description

📞 Support & Community

📋 Issues: GitHub Issues
📖 Documentation: Complete Docs
📊 Project Status: Status Dashboard
🔧 Developer Guide: Development Guidelines

📄 License

Apache-2.0 License – See LICENSE for full terms.

� Acknowledgements

LangChain for LLM integration framework
FastAPI for the high-performance web framework
SQLite-vec for efficient vector storage
spaCy for advanced NLP processing
Preact for the lightweight frontend framework
OpenRouter for LLM API access

🚨 Troubleshooting

Common Issues

Port Already in Use:

# Kill existing processes
.\stop.ps1
# Wait a few seconds, then restart
.\start.ps1

Dependencies Not Installing:

# Clear cache and reinstall
Remove-Item -Recurse -Force node_modules, .venv
.\setup.ps1

API Key Issues:

# Verify your .env file has the correct API key
Get-Content .env | Select-String "OPENROUTER_API_KEY"

Performance Issues:

# Check system status
.\scripts\utilities\production_status.ps1
# Run diagnostics
.\scripts\testing\monitoring_dashboard.ps1

For more troubleshooting help, see Installation Guide.

🎯 Ready to start? Follow the Quick Start section above!

📖 Need more details? Check out our Complete Documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
backend		backend
benchmarks		benchmarks
docs		docs
instructions		instructions
logs		logs
node_modules		node_modules
resources		resources
scripts		scripts
test_results		test_results
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
clean_logs.ps1		clean_logs.ps1
config.yaml		config.yaml
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.ps1		setup.ps1
start.ps1		start.ps1
stop.ps1		stop.ps1
view_logs.ps1		view_logs.ps1

License

Muhanad-husn/sociorag

Folders and files

Latest commit

History

Repository files navigation