Skip to content

A production-grade AI agent that reads business documents (PDF, DOCX), understands the context, and performs concrete actions: summary, risk alerts, action plans, and draft responses.

License

djelacik/doc-agent-pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

doc-agent-pro

A production-grade AI agent specializing in government and commercial RFP analysis with advanced business intelligence capabilities. Features multilingual support for global procurement opportunities and goes far beyond generic document summarization to provide strategic insights for winning more contracts.


πŸš€ Key Features

🌍 Multilingual RFP Analysis

  • English RFP Support with comprehensive requirements extraction
  • Finnish RFP Support including HILMA procurement system integration
  • Automatic Language Detection with confidence scoring
  • International Organization Support (UN agencies, UNAIDS, WHO, etc.)

🎯 Intelligent Document Classification

  • Automatic Government RFP Detection with confidence scoring
  • Healthcare vs Government Procurement Classification
  • Compliance Framework Identification (FedRAMP, NIST, FISMA, Section 508)
  • Security Clearance Requirement Extraction
  • Set-Aside Opportunity Detection (Small Business, 8(a), SDVOSB, etc.)

🧠 Advanced Business Intelligence

  • Structured Requirements Extraction with categorized lists
  • Risk Assessment & Bid Recommendations based on company capabilities
  • Evaluation Criteria Breakdown with scoring weights
  • Competition Analysis and market positioning advice
  • Compliance Requirement Mapping for proposal responses
  • Sample Response Generation tailored to specific RFP requirements

⚑ Production-Grade Performance

  • Smart handling of large documents (up to 80K+ characters)
  • Enhanced prompt engineering for consistent structured output
  • Production-ready error handling and timeout management
  • Real-time processing feedback and progress indicators
  • Robust parsing system with fallback mechanisms

πŸ† Competitive Advantage

vs. Generic AI Tools (ChatGPT, Claude):

  • βœ… Domain Expertise: Deep government contracting knowledge
  • βœ… Multilingual Support: English and Finnish RFP analysis
  • βœ… Compliance Awareness: Automatic framework detection
  • βœ… Business Intelligence: Risk assessment and recommendations
  • βœ… Structured Analysis: Consistent, actionable outputs
  • βœ… International Scope: UN agencies and global procurement support

vs. Enterprise Solutions (Shipley, RFPIO):

  • βœ… AI-Powered: Instant analysis vs. manual processes
  • βœ… Multilingual: Global procurement vs. English-only tools
  • βœ… Affordable: SaaS pricing vs. $50K+ enterprise licenses
  • βœ… Easy-to-Use: No training required vs. complex workflows
  • βœ… Government-Focused: Specialized intelligence vs. generic tools
  • βœ… Proven Results: 20+ requirements extracted from complex RFPs

πŸ—οΈ Architecture

  • Frontend: Streamlit (Python)
  • Backend: FastAPI (Python)
  • Document Parsing: PyMuPDF, python-docx
  • LLM Agent: LangChain + OpenAI API
  • CI/CD: GitHub Actions
  • Deployment: Render, Fly.io, or Docker

See docs/arhcitecture.md for a detailed overview.


πŸ“¦ Project Structure

doc-agent-pro/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py
β”‚   β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── routes/
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   └── streamlit_app.py
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ integration/     # End-to-end workflow tests
β”‚   β”œβ”€β”€ debug/          # Debug and utility scripts
β”‚   β”œβ”€β”€ samples/        # Test RFP documents (English & Finnish)
β”‚   β”œβ”€β”€ api/           # API endpoint tests
β”‚   └── unit/          # Unit tests
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ arhcitecture.md
β”‚   β”œβ”€β”€ CONTRIBUTING.md
β”‚   β”œβ”€β”€ conventional-commits.md
β”‚   β”œβ”€β”€ docker-deployment.md    # Comprehensive Docker guide
β”‚   β”œβ”€β”€ GENERAL-instructions.md
β”‚   └── agents/
β”‚       └── rfp.md
β”œβ”€β”€ docker-build-run.sh         # One-command Docker setup
β”œβ”€β”€ docker-verify.sh           # Docker setup verification
β”œβ”€β”€ docker-compose.yml         # Docker Compose configuration
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
└── package.json

πŸ› οΈ Getting Started

🐳 Docker Quick Start (Recommended)

The fastest way to get doc-agent-pro running is with Docker:

# 1. Clone the repository
git clone https://github.com/your-username/doc-agent-pro.git
cd doc-agent-pro

# 2. Set your OpenAI API key
export OPENAI_API_KEY=your_openai_api_key_here

# 3. Build and run with one command
chmod +x docker-build-run.sh
./docker-build-run.sh

πŸŽ‰ That's it! The application will be available at:

Verify your setup: Run ./docker-verify.sh to test everything is working.

πŸ”§ Alternative: Docker Compose

export OPENAI_API_KEY=your_openai_api_key_here
docker-compose up -d

πŸ’» Local Development Quick Start

For the fastest local setup, use the automated start script:

git clone https://github.com/your-username/doc-agent-pro.git
cd doc-agent-pro
chmod +x start.sh
./start.sh both

This will:

  • Create and activate virtual environments
  • Install all dependencies
  • Start both backend (port 8000) and frontend (port 8501)
  • Open the application in your browser

Prerequisites

  • Python 3.11+
  • Node.js (for commit tooling)
  • OpenAI API key (set as OPENAI_API_KEY environment variable)

Manual Setup

1. Clone the repository

git clone https://github.com/your-username/doc-agent-pro.git
cd doc-agent-pro

2. Set up the backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

3. Run the backend

uvicorn app.main:app --reload

4. Set up and run the frontend

cd ../frontend
streamlit run streamlit_app.py

πŸ“Š Document Processing

The application handles documents of various sizes with intelligent processing and multilingual support:

Supported File Types

  • PDF: Text-based PDFs up to 50MB (English and Finnish)
  • DOCX: Microsoft Word documents (future feature)

Multilingual Capabilities

  • English RFPs: Full requirements extraction and analysis
  • Finnish RFPs: Native language support with HILMA integration
  • Language Detection: Automatic identification with confidence scoring
  • International Organizations: Support for UN agencies, UNAIDS, WHO, UNICEF

Large Document Handling

  • Smart Processing: Handles documents up to 80K+ characters
  • Enhanced Prompts: Consistent structured output for long documents
  • Token Management: Automatically handles OpenAI API token limits
  • Processing Time: 10 seconds to 2 minutes depending on document size
  • Progress Feedback: Real-time indicators for long-running operations

Requirements Extraction Performance

  • English RFPs: 15-20+ requirements typically extracted
  • Finnish RFPs: 8-12+ requirements typically extracted
  • Structured Output: Categorized requirements (Technical, Experience, Personnel, Compliance)
  • Sample Responses: AI-generated proposal responses tailored to specific RFP requirements

Best Practices

  • Use clear, text-based PDFs (not scanned images)
  • Files under 10MB process fastest
  • Large documents are automatically optimized for AI analysis
  • Processing includes intelligent section preservation for RFPs and business documents

πŸ§ͺ Running Tests

Integration Tests

# Run comprehensive integration tests
python tests/integration/test_english_requirements.py

# Test specific language support
python tests/integration/test_finnish_detection.py

# Full workflow verification
python tests/integration/test_complete_app_workflow.py

Debug Utilities

# Test document detection and classification
python tests/debug/debug_detection_new.py

# Analyze parsing and requirements extraction
python tests/debug/debug_english_parsing.py

Unit Tests

pytest tests/unit/

πŸ€– Commit Conventions

This project uses Conventional Commits and Commitizen. To make a commit, use:

npm run commit

πŸ“„ Documentation


πŸ“¦ Deployment

Docker Deployment (Recommended)

Docker Deployment (Recommended)

One-Command Docker Setup

# Clone and build with the enhanced script
git clone https://github.com/your-username/doc-agent-pro.git
cd doc-agent-pro
export OPENAI_API_KEY=your_openai_api_key_here
chmod +x docker-build-run.sh
./docker-build-run.sh

Verification: Run ./docker-verify.sh to test your Docker setup.

Alternative: Docker Compose

# Run with docker-compose
export OPENAI_API_KEY=your_openai_api_key_here
docker-compose up -d

#### Manual Docker Commands
```zsh
# Build the image
docker build -t doc-agent-pro:latest .

# Run the container
docker run -d \
  --name doc-agent-pro \
  -p 8000:8000 \
  -p 8501:8501 \
  -e OPENAI_API_KEY=your_openai_api_key_here \
  doc-agent-pro:latest

Access Points

πŸ“‹ For detailed Docker setup and troubleshooting, see docs/docker-deployment.md

Cloud Deployment

  • Render: Use the Dockerfile for automatic deployment
  • Fly.io: Compatible with flyctl deploy
  • Railway: Direct GitHub integration with Dockerfile
  • AWS/GCP/Azure: Container registry deployment ready

Docker Troubleshooting

Common Issues and Solutions

Build Errors:

# If you encounter package hash mismatches:
docker system prune -a  # Clean Docker cache
docker build --no-cache -t doc-agent-pro:latest .

# If dependencies fail to install:
docker build --build-arg PIP_TIMEOUT=120 -t doc-agent-pro:latest .

Port Conflicts:

# Check what's using the ports
lsof -i :8000
lsof -i :8501

# Stop conflicting services or use different ports
docker run -p 8002:8000 -p 8502:8501 -e OPENAI_API_KEY=your_key doc-agent-pro:latest

Container Management:

# View container logs
docker logs doc-agent-pro

# Stop and restart
docker stop doc-agent-pro
docker start doc-agent-pro

# Complete cleanup and fresh start
docker rm -f doc-agent-pro
./docker-build-run.sh

Performance Issues:

# Check container resource usage
docker stats doc-agent-pro

# Increase memory limits if needed
docker run --memory=2g --cpus=2 -p 8000:8000 -p 8501:8501 -e OPENAI_API_KEY=your_key doc-agent-pro:latest

πŸ“ License

This project is licensed under the MIT License. See LICENSE for details.


πŸ™‹β€β™‚οΈ Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.


πŸ“£ Acknowledgements


About

A production-grade AI agent that reads business documents (PDF, DOCX), understands the context, and performs concrete actions: summary, risk alerts, action plans, and draft responses.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages