Skip to content

VerixAI: An intelligent document analysis platform using RAG for querying large document collections and receiving AI-generated answers with verifiable citations.

Notifications You must be signed in to change notification settings

arunsai63/verix-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VerixAI: Intelligent Document Analysis with Citations

Build Status Python Version React Version

VerixAI is a production-ready document analysis platform focused on one core mission: helping knowledge workers query large document collections efficiently and receive AI-generated answers with precise, verifiable citations.

Designed for professionals handling extensive document repositoriesβ€”whether medical records, legal cases, or corporate policiesβ€”VerixAI provides accurate, context-aware responses backed by advanced retrieval and ranking algorithms.

image

✨ Key Features

  • Multi-Format Document Ingestion: Process PDF, DOCX, PPTX, HTML, TXT, MD, XLSX, and JSON files with high-fidelity extraction.
  • Advanced RAG Pipeline: Sophisticated Retrieval-Augmented Generation with hybrid search (semantic + keyword) and advanced ranking algorithms.
  • Precise Citations: Every answer includes clear citations with source documents, chunk indices, and confidence scores for full verifiability.
  • Role-Aware Responses: Tailored responses for different professional contexts (Doctor, Lawyer, HR) with appropriate disclaimers and terminology.
  • Dataset Management: Organize documents into isolated, searchable collections for improved query accuracy and context control.
  • Multi-Agent Architecture: Specialized AI agents for document ingestion, parallel retrieval, ranking, citation validation, and quality control.
  • Parallel Processing: Asynchronous document processing with configurable worker pools for handling large-scale document collections.
  • Scalable Infrastructure: Modern async stack (FastAPI, React) containerized for easy deployment and horizontal scaling.

πŸš€ Live Demo & Screenshots

✨ Check out the live demo here! ✨

image

Query Interface image

Upload and Dataset Management image


πŸ—ΊοΈ Roadmap

Recently Completed βœ…

  • Multi-Agent Architecture: Specialized agents for document processing and retrieval
  • Advanced Retrieval System: Hybrid search with semantic and keyword matching
  • Parallel Document Processing: Asynchronous processing with worker pools
  • Multi-LLM Support: Support for Ollama (local models), OpenAI, and Claude
  • Citation Validation: Automated validation of source citations

Phase 1: Core Enhancements (Q1 2025)

  • Advanced Document Processing: OCR support for scanned documents
  • Real-time Collaboration: Multiple users working on same dataset
  • Export Functionality: Export Q&A sessions as reports (PDF/Word)
  • Voice Input/Output: Speech-to-text queries and text-to-speech responses

Phase 2: Intelligence Features (Q2 2025)

  • Document Comparison: Compare and contrast multiple documents
  • Knowledge Graph: Visual representation of document relationships
  • Custom Embeddings: Fine-tune embeddings for specific domains
  • Advanced Analytics Dashboard: Usage metrics and insights

Phase 3: Enterprise Features (Q3 2025)

  • SSO Integration: SAML/OAuth support for enterprise authentication
  • Audit Logging: Complete audit trail for compliance
  • API Rate Limiting: Advanced rate limiting and usage analytics
  • Multi-tenancy: Support for multiple isolated organizations

Phase 4: Advanced Analytics (Q4 2025)

  • Sentiment Analysis: Analyze document sentiment and tone
  • Entity Recognition: Extract and link named entities
  • Time-series Analysis: Track changes across document versions
  • Custom Models: Support for domain-specific fine-tuned models

Community Contributions Welcome!

We encourage contributions in these areas:

  • Additional file format support (epub, rtf, etc.)
  • Language translations and internationalization
  • Performance optimizations
  • Bug fixes and documentation improvements

πŸ› οΈ Tech Stack & Architecture

VerixAI is built with a modern, microservices-oriented architecture.

Component Technology
Backend FastAPI, Python 3.10+, LangChain, Uvicorn
Frontend React 18, TypeScript, Material-UI (MUI), Axios
Vector Database ChromaDB
LLM & Embeddings OpenAI (GPT-4, text-embedding-3-small)
Document Processing MarkItDown
Infrastructure Docker, Docker Compose, Nginx
Optional PostgreSQL (for metadata)

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚     β”‚                 β”‚
β”‚  React Frontend │────▢│  Nginx Proxy    β”‚
β”‚ (Port 3000)     β”‚     β”‚ (Production)    β”‚
β”‚                 β”‚     β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                    β”‚
                    β”‚  FastAPI Backend   β”‚
                    β”‚   (Port 8000)      β”‚
                    β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜
                       β”‚             β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”        β”Œβ”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚ChromaDB β”‚        β”‚PostgreSQLβ”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🏁 Getting Started

The easiest way to get VerixAI running is with Docker and Docker Compose.

Prerequisites

  • Docker and Docker Compose installed
  • An OpenAI API Key
  • Git

Installation

  1. Clone the repository:

    git clone https://github.com/arunsai63/verix-ai.git
    cd verix-ai
  2. Set up your environment variables:

    • Copy the example environment file:
      cp .env.example .env
    • Edit the .env file and add your OpenAI API key:
      OPENAI_API_KEY=your-actual-api-key-here
      # Generate a secure random key for JWTs
      SECRET_KEY=a_very_secure_random_string_of_at_least_32_characters
      
  3. Launch the application with Docker Compose:

    docker-compose up --build -d

    This command builds the images and starts the frontend, backend, and ChromaDB services in detached mode.

  4. Access VerixAI:


πŸ“– Usage

  1. Upload Documents: Navigate to the "Upload" tab, select or drag-and-drop your files, assign them to a new or existing dataset, and click "Upload".
  2. Query Documents: Go to the "Query" tab, type your question, select the dataset(s) to search, choose a professional role, and get your cited answer.
  3. Manage Datasets: View, inspect, and delete your document collections from the "Datasets" tab.

Example Queries

  • General: "Summarize the key findings from the Q4 reports."
  • Doctor: "What are the patient's pre-existing conditions and current medications?"
  • Lawyer: "Find all precedents related to intellectual property theft in the provided case files."
  • HR: "What is the company's official policy on remote work and what are the eligibility criteria?"

πŸ†• New Features

Document Summarization

Navigate to the "Summarize" tab to generate various types of summaries:

  • Summary Types: Executive, Key Points, Chapter-wise, Technical, Bullet Points, Abstract
  • Length Options: Brief (1-2 paragraphs), Standard (1 page), Detailed (2-3 pages)
  • Custom Instructions: Add specific guidance for the summary generation

Interactive Chat

Use the "Chat" tab for conversational document analysis:

  • Create chat sessions with one or multiple datasets
  • Ask follow-up questions with maintained context
  • View citations and sources for each response
  • Export conversation history as JSON or Markdown

CSV Analytics

Upload CSV files and analyze them using natural language:

  • Example Queries:
    • "What is the average sales by region?"
    • "Show me the trend of revenue over time"
    • "Find correlations between variables"
  • Automatic Visualizations: Line charts, bar graphs, heatmaps, scatter plots
  • Statistical Analysis: Descriptive statistics, correlations, distributions
  • Data Export: Download results as JSON, CSV, or HTML reports

About

VerixAI: An intelligent document analysis platform using RAG for querying large document collections and receiving AI-generated answers with verifiable citations.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published