DocTags Analyzer and Visualizer

AI-powered document analysis and visualization tool for extracting structured content from PDFs.

🚀 Quick Start with Docker

Prerequisites

Docker and Docker Compose installed
At least 4GB of free memory
~500MB disk space for the AI model

Running with Docker

Clone the repository

git clone <repository-url>
cd doc-analyzer

Place your PDF files in the project directory
```
cp /path/to/your/document.pdf ./
```
Start the application
```
docker-compose up -d --build
```
Access the web interface
- Open http://localhost:8080 in your browser
- Select a PDF from the dropdown
- Process your documents through the three-step workflow

First Run Notice

⚠️ Important: The first analysis will take 5-10 minutes as the AI model (SmolDocling-256M) needs to be downloaded (~500MB). Subsequent runs will be much faster (30-60 seconds).

📋 Features

Document Analysis: Extract comprehensive document structure using AI
Visualization: Generate visual overlays showing document elements
Image Extraction: Automatically extract and catalog embedded images
Web Interface: User-friendly interface for document processing

🛠️ Manual Usage

Process PDF pages with DocTags:

python analyzer.py --image document.pdf --page 8 && python visualizer.py --doctags results/output.doctags.txt --pdf document.pdf --page 8 --adjust && python picture_extractor.py --doctags results/output.doctags.txt --pdf document.pdf --page 8 --adjust

🐛 Troubleshooting

Docker Issues

Container won't start
- Check logs: docker-compose logs analyser
- Ensure ports aren't in use: lsof -i :8080
"No module named 'docling_core'" error
- Rebuild the container: docker-compose down && docker-compose up -d --build
Analysis stuck on "Running..."
- First run downloads the AI model (~500MB), this can take 5-10 minutes
- Check progress: docker-compose exec analyser du -sh /root/.cache/huggingface/
- Monitor CPU usage: docker-compose exec analyser ps aux | grep analyzer
PDF not loading
- Ensure poppler is installed (already included in Dockerfile)
- Place PDFs in the project root directory
- PDFs must have .pdf extension

Performance Tips

First analysis is slow due to model download
Subsequent analyses are much faster (model is cached)
Processing time depends on PDF complexity and page size
Monitor memory usage: docker-compose exec analyser free -h

📁 Project Structure

doc-analyzer/
├── backend/
│   ├── page_treatment/     # Core processing scripts
│   │   ├── analyzer.py     # AI-powered document analysis
│   │   ├── visualizer.py   # Visualization generator
│   │   └── picture_extractor.py  # Image extraction
│   ├── app.py             # Flask web application
│   └── requirements.txt   # Python dependencies
├── frontend/              # Web interface
├── results/              # Output directory (auto-created)
├── Dockerfile           # Docker configuration
└── docker-compose.yml   # Docker Compose setup

🔧 Development

To modify the application:

Make changes to the code
Rebuild the Docker image: docker-compose up -d --build
Check logs for errors: docker-compose logs -f analyser

📄 License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
run_app.sh		run_app.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocTags Analyzer and Visualizer

🚀 Quick Start with Docker

Prerequisites

Running with Docker

First Run Notice

📋 Features

🛠️ Manual Usage

🐛 Troubleshooting

Docker Issues

Performance Tips

📁 Project Structure

🔧 Development

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

scub-france/SmolDocling-visualizer

Folders and files

Latest commit

History

Repository files navigation

DocTags Analyzer and Visualizer

🚀 Quick Start with Docker

Prerequisites

Running with Docker

First Run Notice

📋 Features

🛠️ Manual Usage

🐛 Troubleshooting

Docker Issues

Performance Tips

📁 Project Structure

🔧 Development

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages