The Diagram Detector project identifies and classifies diagrams in questions and images. It uses machine learning models for text and image detection to help detect diagram-related content in educational and scientific material.
-
data/
- datasetsraw/
- original datasetstext_dataset.csv
images/
diagram/
(subfolders: biology, botany, chemistry, mathematics, physics, zoology)no_diagram/
processed/
- processed datasetsprocessed_text_dataset.csv
processed_images/
-
models/
- trained modelstext_detector/
image_detector.pt
-
src/
- training / preprocessing / utilitiespreprocess.py
text_detector.py
image_detector.py
-
api/
- FastAPI applicationmain.py
-
tests/
- unit teststest_api.py
Other files: Dockerfile
, requirements.txt
, setup.sh
, start.py
.
- Text detection using transformer-based models (BERT or similar)
- Image detection using a ResNet-based classifier
- FastAPI REST endpoints with automatic docs
- Health checks and basic monitoring endpoints
- Configurable via environment variables
- Tests with pytest and included example scripts
- Python 3.8+
- Recommended: create and use a virtual environment (instructions below)
Use a virtual environment to isolate project dependencies.
-
Verify Python is available (use
python3
on many Linux/macOS systems):python3 --version
-
Create the virtual environment in the project root:
python3 -m venv .venv
-
Activate the virtual environment:
-
Linux / macOS (bash/zsh):
source .venv/bin/activate
-
Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
-
-
Upgrade packaging tools and install requirements:
pip install --upgrade pip setuptools wheel pip install -r requirements.txt
Notes:
- If you need CUDA-enabled PyTorch, use the selector at https://pytorch.org/get-started/locally/ to obtain the correct install command (it may use a custom
--index-url
). Example CPU-only command:pip install torch --index-url https://download.pytorch.org/whl/cpu
-
Clone the repository:
git clone https://github.com/ankitrajsh/ML-Detect-Diagram-in-Question-convert-into-Mathjax.git cd ML-Detect-Diagram-in-Question-convert-into-Mathjax
-
Create and activate a virtual environment (see the "Virtual environment" section above for commands).
-
Install dependencies:
pip install -r requirements.txt
-
(Optional) Copy environment example and edit values if present:
cp .env.example .env # only if .env.example exists
-
Start the API (two equivalent options):
# recommended: helper script python start.py # or run the FastAPI app directly python api/main.py
-
Open the API docs in your browser: http://localhost:8000/docs
- Build the image:
docker build -t diagram-detector .
- Run the container (map the port):
docker run -p 8000:8000 diagram-detector
- Documentation:
GET /docs
- Health:
GET /health
- Info:
GET /
Example: Text detection
curl -X POST "http://localhost:8000/detect_text" \
-H "Content-Type: application/json" \
-d '{"question": "Draw the structure of benzene"}'
Example: Image detection
curl -X POST "http://localhost:8000/detect_image" \
-H "Content-Type: multipart/form-data" \
-F "file=@your_image.jpg"
Configuration is controlled via environment variables. If a .env.example
is provided, copy it to .env
and update values.
Important variables (defaults shown where applicable):
API_HOST
(default:0.0.0.0
)API_PORT
(default:8000
)LOG_LEVEL
(default:info
)MAX_FILE_SIZE_MB
(default:10
)TEXT_MODEL_PATH
- path to a saved text model (if required)IMAGE_MODEL_PATH
- path to the image model (e.g.,models/image_detector.pt
)
-
Run tests:
pytest
-
Run a single test file:
pytest tests/test_api.py
-
Run basic tests without pytest (if the file is executable as a script):
python tests/test_api.py
-
Create sample data (if
preprocess.py
supports it):python src/preprocess.py --create-sample
-
Train models (if training scripts are implemented):
python src/text_detector.py python src/image_detector.py
- If the API fails to start, check that dependencies from
requirements.txt
are installed and that the configured model paths exist. - Check logs (the application respects
LOG_LEVEL
).
Contributions are welcome. Please open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE
file for details.