A full-stack application with Next.js frontend and FastAPI backend that combines Mistral OCR API and LangExtract for comprehensive document processing and intelligent information extraction with visual workflow management, powered by Convex for real-time backend infrastructure and Clerk for authentication.
# 1. Clone and setup
git clone <repo-url>
cd finance_app
# 2. Configure environment (Backend)
cp .env.example .env
# Edit .env with your API keys
# 3. Configure frontend environment
cd web
cp .env.example .env.local
# Add Convex and Clerk configuration
# 4. Start with Docker
./start.sh
# Or: docker-compose up --build
# 5. Open application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000/docs
Transform documents into structured data using state-of-the-art OCR and AI-powered extraction:
- π Full-Stack Interface: Modern React/Next.js web application with FastAPI backend
- οΏ½ Authentication & User Management: Secure user authentication and session management with Clerk
- πΎ Real-time Backend: Convex provides real-time database, file storage, and serverless functions
- οΏ½π Visual Workflows: Create and manage document processing workflows with React Flow
- π Mistral OCR Integration: Direct API integration with Mistral OCR for high-quality text extraction from PDFs and images
- π§ LangExtract Power: Structured information extraction using the LangExtract library with support for OpenAI, Gemini, and Ollama models
- π§ Node-Based Processing: Visual workflow builder with configurable nodes for OCR, AI extraction, validation, and data export
- β‘ Complete Pipeline: Full document processing from upload to structured output in a single workflow
- π³ Docker Ready: Complete containerized setup with docker-compose
- π± Multi-user Support: Per-user document and workflow management with secure data isolation
The application follows a clean, modular architecture combining modern web technologies:
src/
βββ main.py # FastAPI application entry point
βββ config.py # Configuration management (Pydantic Settings)
βββ models/ # Pydantic models and schemas
β βββ workflow.py # Workflow and node definitions
βββ dependencies.py # Dependency injection
βββ services/ # Business logic layer
β βββ ocr_service.py # Mistral OCR API integration
β βββ extraction_service.py # LangExtract library integration
β βββ workflow_engine.py # Visual workflow execution engine
βββ routers/ # API endpoints organization
βββ health.py # Health checks and utilities
βββ ocr.py # OCR-specific endpoints
βββ extraction.py # Information extraction endpoints
βββ workflows.py # Workflow management endpoints
βββ pipeline.py # Complete processing pipeline
web/src/
βββ app/ # Next.js App Router
β βββ workflows/ # Workflow management pages
β βββ document/ # Document processing interface
β βββ sign-in/ # Clerk authentication pages
β βββ sign-up/ # Clerk registration pages
β βββ api/ # API route handlers
βββ components/ # React components
β βββ workflow/ # Workflow builder components
β β βββ nodes/ # Individual workflow node components
β βββ Navigation.tsx # App navigation with user management
βββ lib/ # Utility libraries
β βββ extractionUtils.ts # Client-side extraction helpers
βββ middleware.ts # Clerk authentication middleware
βββ convex/ # Convex backend functions
βββ schema.ts # Database schema definitions
βββ auth.config.ts # Clerk + Convex authentication
βββ documents.ts # Document management functions
βββ workflows.ts # Workflow CRUD operations
βββ users.ts # User profile management
- Mistral OCR API: Direct HTTP integration for text extraction from documents
- LangExtract Library: Structured information extraction with few-shot learning capabilities
- React Flow: Visual workflow builder for creating processing pipelines
- FastAPI: High-performance Python API framework with automatic OpenAPI documentation
- Next.js: Full-stack React framework with App Router and API routes
- Convex: Real-time backend-as-a-service providing:
- ποΈ Real-time Database: Reactive queries with automatic UI updates
- π File Storage: Secure document upload and storage
- β‘ Serverless Functions: TypeScript functions for business logic
- π Real-time Sync: Automatic data synchronization across clients
- π‘οΈ Built-in Security: Row-level security and data isolation
- Clerk: Complete authentication solution featuring:
- π Multi-factor Authentication: Email, phone, and authenticator app support
- π₯ Social Logins: Google, GitHub, Discord, and more
- π¨ Customizable UI: Pre-built components with full customization
- π‘οΈ Security First: JWT tokens, session management, and user verification
- π± Multi-device Support: Seamless authentication across devices
- Direct API Integration: No LangChain dependency - direct service integrations for optimal performance
- Separation of Concerns: Services handle business logic, routers handle HTTP, components handle UI
- Dependency Injection: Clean, testable service instantiation with FastAPI dependencies
- Type Safety: Comprehensive Pydantic models for all data structures and TypeScript for frontend
- Visual Workflow Design: Node-based processing pipelines with React Flow
- Error Handling: Centralized exception handling with detailed error responses
# Clone and navigate to the project
cd finance_app
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env # Create from template
Set the following environment variables in .env
:
# Required for OCR processing via Mistral API
MISTRAL_API_KEY=your_mistral_api_key_here
# Required for AI extraction via LangExtract (choose one)
OPENAI_API_KEY=your_openai_api_key_here
# OR (LangExtract will use OPENAI_API_KEY if available)
LANGEXTRACT_API_KEY=your_openai_api_key_here
# Optional configuration
DEBUG=false
HOST=0.0.0.0
PORT=8000
Set the following environment variables in web/.env.local
:
# Convex Configuration
NEXT_PUBLIC_CONVEX_URL=your_convex_deployment_url
CONVEX_DEPLOY_KEY=your_convex_deploy_key
# Clerk Authentication Configuration
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_xxx
CLERK_SECRET_KEY=sk_test_xxx
CLERK_JWT_ISSUER_DOMAIN=your-clerk-domain.clerk.accounts.dev
# Optional: Webhook configuration for user sync
CLERK_WEBHOOK_SECRET=whsec_xxx
- Install Convex CLI:
npm install -g convex
- Initialize Convex:
cd web && npx convex dev
- Deploy Schema: Convex will automatically deploy your schema and functions
- Get your deployment URL: Copy from Convex dashboard to
NEXT_PUBLIC_CONVEX_URL
- Create Clerk Application: Visit clerk.com and create a new application
- Configure Authentication: Enable desired sign-in methods (email, social, etc.)
- Copy API Keys: Get publishable and secret keys from Clerk dashboard
- Configure JWT: Set JWT issuer domain for Convex integration
# Development mode
python -m src.main
# Or with uvicorn directly
uvicorn src.main:app --reload --host 0.0.0.0 --port 8000
Visit http://localhost:8000
for the interactive documentation, or test with curl:
# Health check
curl http://localhost:8000/health
# Test OCR
curl -X POST -F "file=@sample_document.pdf" http://localhost:8000/ocr/test
GET /health
- API health and configuration statusGET /ping
- Simple health check for load balancers
POST /ocr/extract
- Extract text from documents using Mistral OCRPOST /ocr/test
- Test OCR with debugging information
POST /extract/information
- Extract structured data from text using LangExtractGET /extract/models
- List available AI models (OpenAI, Gemini, Ollama)
GET /workflows
- List all available workflowsPOST /workflows
- Create a new workflowGET /workflows/{id}
- Get workflow detailsPUT /workflows/{id}
- Update workflow configurationPOST /workflows/{id}/execute
- Execute a workflowPOST /workflows/validate
- Validate workflow configuration
POST /process/document
- Full document processing (OCR + Extraction)POST /process/invoice
- Invoice processing with preset configuration
- Frontend Interface:
http://localhost:3000
- Full web application - Swagger UI:
http://localhost:8000/docs
- Interactive API documentation - ReDoc:
http://localhost:8000/redoc
- Alternative API documentation
import requests
# Extract text only
with open("document.pdf", "rb") as f:
response = requests.post(
"http://localhost:8000/ocr/extract",
files={"file": f}
)
result = response.json()
print(result["text"])
import requests
import json
# Define what to extract
extraction_config = {
"prompt_description": "Extract invoice details: vendor, amount, date",
"examples": [
{
"text": "Invoice #12345 from ABC Corp. Total: $1,250.00",
"extractions": [
{
"extraction_class": "invoice_number",
"extraction_text": "12345",
"attributes": {"confidence": 1.0}
},
{
"extraction_class": "vendor",
"extraction_text": "ABC Corp",
"attributes": {"confidence": 1.0}
},
{
"extraction_class": "amount",
"extraction_text": "$1,250.00",
"attributes": {"currency": "USD"}
}
]
}
],
"model_type": "openai",
"model_id": "gpt-4o"
}
# Process document
with open("invoice.pdf", "rb") as f:
response = requests.post(
"http://localhost:8000/process/document",
files={"file": f},
data={"extraction_request": json.dumps(extraction_config)}
)
result = response.json()
print("Extracted text:", result["ocr_text"][:200])
print("Entities found:", len(result["extracted_entities"]))
for entity in result["extracted_entities"]:
print(f"- {entity['extraction_class']}: {entity['extraction_text']}")
# Process invoice with predefined extraction
with open("invoice.pdf", "rb") as f:
response = requests.post(
"http://localhost:8000/process/invoice",
files={"file": f}
)
result = response.json()
# Automatically extracts: invoice number, vendor, amounts, dates, etc.
- Health Check:
GET /health
- Verify configuration - OCR Test:
POST /ocr/test
- Test with a sample document - Pipeline Test:
POST /process/invoice
- Try with an invoice
- PDF: Documents up to 50MB
- Images: PNG, JPG, JPEG, WebP up to 50MB
Error | Solution |
---|---|
MISTRAL_API_KEY not configured |
Set the environment variable |
File too large |
Use files under 50MB |
Unsupported file type |
Use PDF or supported image formats |
OCR extracted very little text |
Check document quality and readability |
Variable | Required | Description |
---|---|---|
MISTRAL_API_KEY |
β | Mistral API key for OCR processing |
OPENAI_API_KEY |
β | OpenAI API key for AI extraction |
LANGEXTRACT_API_KEY |
Alternative to OPENAI_API_KEY | |
DEBUG |
β | Enable debug mode (default: false) |
HOST |
β | Server host (default: 0.0.0.0) |
PORT |
β | Server port (default: 8000) |
Variable | Required | Description |
---|---|---|
NEXT_PUBLIC_CONVEX_URL |
β | Convex deployment URL |
CONVEX_DEPLOY_KEY |
β | Convex deployment key (for CI/CD) |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
β | Clerk publishable API key |
CLERK_SECRET_KEY |
β | Clerk secret API key |
CLERK_JWT_ISSUER_DOMAIN |
β | Clerk JWT issuer domain |
CLERK_WEBHOOK_SECRET |
Webhook secret for user sync |
The application integrates directly with the Mistral OCR API for high-quality text extraction:
- Direct API Calls: Uses HTTP requests to
https://api.mistral.ai/v1/ocr
- Document Support: PDFs, PNG, JPG, JPEG, WebP up to 50MB
- Base64 Encoding: Converts documents to data URLs for API transmission
- Markdown Output: Extracts text in structured markdown format
- Page Processing: Handles multi-page documents with page-by-page extraction
LangExtract provides powerful structured information extraction capabilities:
- Few-Shot Learning: Uses example-based extraction for high accuracy
- Multiple Providers: Supports OpenAI, Gemini, and Ollama language models
- Structured Output: Extracts entities with classes, text, and attributes
- Type Safety: Full Pydantic model validation for extraction results
- Customizable Prompts: Flexible prompt descriptions for different extraction tasks
Visual workflow builder powered by React Flow:
- Node Types: DocumentInput, OCRProcessor, AIExtractor, DataValidator, ExportData
- Visual Editor: Drag-and-drop interface for building processing pipelines
- Real-time Execution: Execute workflows and see results in real-time
- Validation: Built-in validation for workflow configuration and node connections
- Extensible: Easy to add new node types and processing capabilities
The application uses Convex for real-time data management with the following schema:
users: {
userId: string, // Clerk user ID
email: string,
firstName?: string,
lastName?: string,
imageUrl?: string,
createdAt: string,
updatedAt: string,
}
documents: {
filename: string,
status: "processing" | "processed" | "failed",
timestamp: string,
file_size_mb: number,
content_type: string,
userId?: string, // Owner user ID
processing_result?: {
ocr_text: string,
extracted_entities: Array<{
extraction_class: string,
extraction_text: string,
attributes: any,
start_char?: number,
end_char?: number,
}>,
extraction_metadata: any,
processing_stats: any,
},
error_message?: string,
}
workflows: {
name: string,
description?: string,
definition: any, // React Flow nodes and edges
is_active: boolean,
created_at: string,
updated_at: string,
userId?: string, // Owner user ID
}
workflow_executions: {
workflow_id: Id<"workflows">,
status: "pending" | "running" | "completed" | "failed",
input_data?: any,
output_data?: any,
started_at: string,
completed_at?: string,
userId?: string, // Executor user ID
}
- Default Model: OpenAI GPT-4o via LangExtract
- Supported Providers: OpenAI, Gemini, Ollama (through LangExtract)
- Customizable: Change model via API parameters or workflow node configuration
- API Key Management: Centralized configuration with fallback options
The application uses Clerk for complete user authentication and management:
- Sign-up/Sign-in: Email and password authentication
- Social Logins: Support for Google, GitHub, Discord, and other providers
- Multi-factor Authentication: Email codes, SMS, and authenticator apps
- User Profiles: Automatic profile management with avatars
- Session Management: Secure JWT-based sessions with automatic refresh
- Password Reset: Built-in password recovery flow
- Email Verification: Automatic email verification for new users
- Public Routes:
/sign-in
,/sign-up
are accessible without authentication - Protected Routes: All other routes require authentication via Clerk middleware
- User Context: User information available throughout the application
- Automatic Sync: User profiles automatically synced with Convex database
// Authentication configuration for Convex
export default {
providers: [
{
domain: process.env.CLERK_JWT_ISSUER_DOMAIN,
applicationID: "convex",
},
],
};
- Automatic User Creation: New users automatically added to Convex database
- Profile Synchronization: User profile updates sync between Clerk and Convex
- Data Isolation: All user data (documents, workflows) properly isolated by userId
- Secure Access: Row-level security ensures users only access their own data
- Real-time Queries: Automatic UI updates when data changes
- TypeScript Schema: Fully typed database operations
- Indexing: Optimized queries with custom indexes
- Transactions: ACID compliance for complex operations
- Pagination: Built-in pagination for large datasets
- Secure Uploads: Direct file uploads to Convex storage
- File Management: Automatic file cleanup and organization
- Access Control: User-based file access permissions
- CDN Integration: Fast file delivery via global CDN
- Query Functions: Read data with real-time subscriptions
- Mutation Functions: Write data with optimistic updates
- Action Functions: External API integrations and side effects
- Scheduled Functions: Cron jobs and background processing
- HTTP Actions: Direct HTTP endpoints for webhooks and APIs
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src/
EXPOSE 8000
CMD ["python", "-m", "src.main"]
# Build and run
docker build -t document-processing-api .
docker run -p 8000:8000 -e MISTRAL_API_KEY=your_key document-processing-api
- Use
uvicorn
with multiple workers for production - Consider Redis caching for frequently processed documents
- Implement rate limiting for API endpoints
- Convex: Automatic scaling and global edge deployment
- Clerk: Built-in performance optimization and global CDN
- Use
/ping
endpoint for health checks - Monitor processing times and error rates
- Set up logging aggregation
- Convex Dashboard: Real-time performance metrics and function logs
- Clerk Analytics: User authentication and session analytics
- Use HTTPS in production
- Implement API key authentication
- Validate and sanitize file uploads
- Set appropriate CORS policies
- Clerk Security: Enterprise-grade security with SOC 2 compliance
- Convex Security: Built-in row-level security and data encryption
# Deploy to Vercel (recommended)
npm run build
vercel deploy
# Deploy Convex functions
npx convex deploy --prod
# Configure production environment variables
# - Add Clerk production keys
# - Add Convex production URL
# - Configure webhooks for user sync
# Traditional deployment
docker build -t document-processing-api .
docker run -p 8000:8000 -e MISTRAL_API_KEY=your_key document-processing-api
# Or use cloud providers
# - AWS ECS/Fargate
# - Google Cloud Run
# - Azure Container Instances
- Convex: Automatically scales with usage, no configuration needed
- Clerk: Supports unlimited users with enterprise plans
- FastAPI Backend: Scale horizontally with load balancers
- File Processing: Consider queue-based processing for large documents