VerixAI is a production-ready document analysis platform focused on one core mission: helping knowledge workers query large document collections efficiently and receive AI-generated answers with precise, verifiable citations.
Designed for professionals handling extensive document repositoriesβwhether medical records, legal cases, or corporate policiesβVerixAI provides accurate, context-aware responses backed by advanced retrieval and ranking algorithms.

- Multi-Format Document Ingestion: Process PDF, DOCX, PPTX, HTML, TXT, MD, XLSX, and JSON files with high-fidelity extraction.
- Advanced RAG Pipeline: Sophisticated Retrieval-Augmented Generation with hybrid search (semantic + keyword) and advanced ranking algorithms.
- Precise Citations: Every answer includes clear citations with source documents, chunk indices, and confidence scores for full verifiability.
- Role-Aware Responses: Tailored responses for different professional contexts (Doctor, Lawyer, HR) with appropriate disclaimers and terminology.
- Dataset Management: Organize documents into isolated, searchable collections for improved query accuracy and context control.
- Multi-Agent Architecture: Specialized AI agents for document ingestion, parallel retrieval, ranking, citation validation, and quality control.
- Parallel Processing: Asynchronous document processing with configurable worker pools for handling large-scale document collections.
- Scalable Infrastructure: Modern async stack (FastAPI, React) containerized for easy deployment and horizontal scaling.
β¨ Check out the live demo here! β¨

- Multi-Agent Architecture: Specialized agents for document processing and retrieval
- Advanced Retrieval System: Hybrid search with semantic and keyword matching
- Parallel Document Processing: Asynchronous processing with worker pools
- Multi-LLM Support: Support for Ollama (local models), OpenAI, and Claude
- Citation Validation: Automated validation of source citations
- Advanced Document Processing: OCR support for scanned documents
- Real-time Collaboration: Multiple users working on same dataset
- Export Functionality: Export Q&A sessions as reports (PDF/Word)
- Voice Input/Output: Speech-to-text queries and text-to-speech responses
- Document Comparison: Compare and contrast multiple documents
- Knowledge Graph: Visual representation of document relationships
- Custom Embeddings: Fine-tune embeddings for specific domains
- Advanced Analytics Dashboard: Usage metrics and insights
- SSO Integration: SAML/OAuth support for enterprise authentication
- Audit Logging: Complete audit trail for compliance
- API Rate Limiting: Advanced rate limiting and usage analytics
- Multi-tenancy: Support for multiple isolated organizations
- Sentiment Analysis: Analyze document sentiment and tone
- Entity Recognition: Extract and link named entities
- Time-series Analysis: Track changes across document versions
- Custom Models: Support for domain-specific fine-tuned models
We encourage contributions in these areas:
- Additional file format support (epub, rtf, etc.)
- Language translations and internationalization
- Performance optimizations
- Bug fixes and documentation improvements
VerixAI is built with a modern, microservices-oriented architecture.
Component | Technology |
---|---|
Backend | FastAPI, Python 3.10+, LangChain, Uvicorn |
Frontend | React 18, TypeScript, Material-UI (MUI), Axios |
Vector Database | ChromaDB |
LLM & Embeddings | OpenAI (GPT-4, text-embedding-3-small) |
Document Processing | MarkItDown |
Infrastructure | Docker, Docker Compose, Nginx |
Optional | PostgreSQL (for metadata) |
βββββββββββββββββββ βββββββββββββββββββ
β β β β
β React Frontend ββββββΆβ Nginx Proxy β
β (Port 3000) β β (Production) β
β β β β
βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βΌ
ββββββββββββββββββββββ
β β
β FastAPI Backend β
β (Port 8000) β
ββββ¬βββββββ¬βββββββ¬βββ
β β
ββββββββΌβββ βββΌβββββββββ
βChromaDB β βPostgreSQLβ
βββββββββββ ββββββββββββ
The easiest way to get VerixAI running is with Docker and Docker Compose.
- Docker and Docker Compose installed
- An OpenAI API Key
- Git
-
Clone the repository:
git clone https://github.com/arunsai63/verix-ai.git cd verix-ai
-
Set up your environment variables:
- Copy the example environment file:
cp .env.example .env
- Edit the
.env
file and add your OpenAI API key:OPENAI_API_KEY=your-actual-api-key-here # Generate a secure random key for JWTs SECRET_KEY=a_very_secure_random_string_of_at_least_32_characters
- Copy the example environment file:
-
Launch the application with Docker Compose:
docker-compose up --build -d
This command builds the images and starts the frontend, backend, and ChromaDB services in detached mode.
-
Access VerixAI:
- Frontend Application: http://localhost:3000
- Backend API Docs: http://localhost:8000/docs
- Upload Documents: Navigate to the "Upload" tab, select or drag-and-drop your files, assign them to a new or existing dataset, and click "Upload".
- Query Documents: Go to the "Query" tab, type your question, select the dataset(s) to search, choose a professional role, and get your cited answer.
- Manage Datasets: View, inspect, and delete your document collections from the "Datasets" tab.
- General:
"Summarize the key findings from the Q4 reports."
- Doctor:
"What are the patient's pre-existing conditions and current medications?"
- Lawyer:
"Find all precedents related to intellectual property theft in the provided case files."
- HR:
"What is the company's official policy on remote work and what are the eligibility criteria?"
Navigate to the "Summarize" tab to generate various types of summaries:
- Summary Types: Executive, Key Points, Chapter-wise, Technical, Bullet Points, Abstract
- Length Options: Brief (1-2 paragraphs), Standard (1 page), Detailed (2-3 pages)
- Custom Instructions: Add specific guidance for the summary generation
Use the "Chat" tab for conversational document analysis:
- Create chat sessions with one or multiple datasets
- Ask follow-up questions with maintained context
- View citations and sources for each response
- Export conversation history as JSON or Markdown
Upload CSV files and analyze them using natural language:
- Example Queries:
- "What is the average sales by region?"
- "Show me the trend of revenue over time"
- "Find correlations between variables"
- Automatic Visualizations: Line charts, bar graphs, heatmaps, scatter plots
- Statistical Analysis: Descriptive statistics, correlations, distributions
- Data Export: Download results as JSON, CSV, or HTML reports