DocuMind is a privacy-focused, self-hosted AI assistant that helps you extract insights from your PDF documents using local LLMs through Ollama. Ask questions about your documents in natural language and receive accurate answers with source citations - all without sending your data to external services.
- π Privacy First: All processing happens locally - no external API calls
- π Multi-format PDF Processing: Robust text extraction with OCR support
- π Hybrid Retrieval System: Combines semantic and keyword search for accuracy
- π€ Local LLM Integration: Uses Ollama (Llama 3.2 3B) for responses
- π¬ Conversation Memory: Maintains context across multiple questions
- π Source Attribution: Shows which documents informed each answer
- π Automatic Document Loading: Auto-loads PDFs from the documents directory
- π Dual Interfaces: Both Streamlit UI and HTML/CSS/JS web interface
- β‘ Docker Ready: Simple setup with Docker and GPU acceleration support
The easiest way to get started is using the included Docker helper script:
# Make the script executable (if needed)
chmod +x run_docker.sh
# Run the script and follow the menu options
./run_docker.sh
Select option 1 from the menu to start DocuMind, then:
- Web UI: http://localhost:8080
- API Endpoint: http://localhost:8000/api
If you prefer to run without Docker:
-
Install Dependencies
pip install -r requirements.txt
-
Install Ollama
# Follow Ollama installation instructions from https://ollama.ai/ # Run the Ollama service ollama serve
-
Install OCR Dependencies (Optional)
pip install pytesseract pdf2image pillow brew install tesseract poppler # For macOS # See documentation/OCR_SETUP.md for other OS instructions
-
Add Documents
- Place PDF files in the
data/documents
directory
- Place PDF files in the
-
Run the Application
# Start the Web Interface python api.py # OR start the Streamlit Interface streamlit run app.py
DocuMind/
βββ app.py # Main Streamlit application
βββ api.py # Alternative web interface (HTML/CSS/JavaScript)
βββ docker-entrypoint.sh # Docker container startup script
βββ docker-compose.yml # Container orchestration configuration
βββ docker-compose.gpu.yml # GPU support configuration
βββ Dockerfile # Container definition
βββ run_docker.sh # Docker helper script
βββ src/
β βββ document_processor.py # PDF processing and extraction with OCR
β βββ chunking.py # Semantic text chunking
β βββ retriever.py # Hybrid retrieval system
β βββ llm_handler.py # LLM integration and prompts
β βββ evaluator.py # Evaluation framework
β βββ preload_models.py # Model preloading script
β βββ utils.py # Utility functions
βββ data/
β βββ documents/ # PDF documents for auto-loading
β βββ vectorstore/ # Chroma vector database
β βββ models_cache/ # Hugging Face model cache
β βββ chroma_cache/ # ChromaDB ONNX model cache
βββ config/
β βββ settings.py # Configuration settings
βββ documentation/ # Detailed documentation files
βββ tests/ # Testing and diagnostic tools
βββ web/ # Web UI assets (HTML/CSS/JS)
DocuMind pre-downloads and caches embedding models to improve startup and query time:
- Models are stored in
./data/models_cache/
- ONNX optimized versions are kept in
./data/chroma_cache/onnx_models/
Choose the right LLM based on your hardware:
- High-end systems: Use larger models like
llama3.2:3b
(default) - Low-resource systems: Switch to
phi3:mini
for faster responses (option 5 in the run_docker.sh menu)
- Documents placed in the
data/documents
directory are automatically loaded when the app starts - Configure auto-loading behavior in
config/settings.py
:AUTO_LOAD_DOCUMENTS = True # Enable/disable auto-loading AUTO_LOAD_SKIP_EXISTING = True # Skip already processed documents
- OCR (Optical Character Recognition) processing for difficult PDFs
- Automatically detects when a PDF needs OCR and applies it
- Perfect for PDFs saved from websites that have selectable text but don't parse correctly
- See OCR Setup Guide for detailed setup instructions
- Use
tests/check_pdf.py
to diagnose problematic PDFs:python tests/check_pdf.py path/to/document.pdf
- Identifies which extraction method works best for each document
- Determines if OCR processing is recommended
Symptom: Requests timeout with error: "Error generating response: Read timed out."
Solution:
- Switch to a smaller LLM model through option 5 in the run_docker.sh script
- Restart the containers to apply changes
Symptom: Documents fail to load or extract properly
Solution:
- Check the format of your PDF
- Run diagnostic tool:
python tests/check_pdf.py path/to/document.pdf
- Enable OCR for problematic documents
Symptom: Error connecting to Ollama service
Solution:
- For Docker: Ensure the Ollama container is running (
docker ps
) - For manual setup: Make sure Ollama is running (
ollama serve
) - See Environment Setup Guide for details on connection configuration
For more troubleshooting tips, see the Full Documentation.
Comprehensive documentation is available in the documentation
folder:
- Complete User Guide
- Docker Setup Guide
- Environment Setup Guide
- OCR Setup Instructions
- Full Technical Documentation
- Document Processing: PyPDF2, PyMuPDF, pdfplumber, Tesseract OCR
- Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
- Vector Database: ChromaDB
- LLM: Ollama (Llama 3.2 3B)
- Frontend: Streamlit, HTML/CSS/JavaScript
- Backend: Python FastAPI
- Containers: Docker, Docker Compose
This project is licensed under the MIT License - see the LICENSE file for details.
Developed with β€οΈ by Fakhrul Fauzi.