Math-OCR-Zero

Math-OCR-Zero: A high-quality mathematical dataset for reinforcement learning in multimodal large models

Math-OCR-Zero is based on the VeRL framework and leverages the synthetic OCR-Math dataset to enhance the reasoning capabilities of multimodal large models. It serves as a reproduction of DeepSeek-R1-Zero in the context of multimodal large models.

By generating multimodal data from the DeepMath-103k dataset and applying the GRPO reinforcement learning method, we demonstrate the self-reflection capability of multimodal large models on the qwen2.5-VL-3B-Instruct model.

Installation

conda create -n math-zero python=3.11
pip install torch==2.6.0
pip install vllm==0.8.4
pip install ray

# verl
pip install -e .

pip install flash-attn --no-build-isolation
pip install wandb matplotlib

Data Preparation

You should first install latex before running the script.

python examples/data_preprocess/deepmath_ocr.py --local_dir {your_local_dir} --train_size {your_train_size} --test_size {your_test_size}

Training

conda activate math-zero
export TRAIN_PATH={your_local_dir}/train.parquet
export VALID_PATH={your_local_dir}/test.parquet
export N_GPUS=4
sh examples/grpo_trainer/run_qwen2_5_vl-3b-deepmath.sh

Datasets

deepmath-ocr-100000

Here are some samples of the generated images:

$3.png$ $2.png$ $1.png$

Models

Qwen2.5-VL-3B-Instruct-GRPO-deepmath-ocr-1k

$vl_3b_step90_critic_score.png$ $vl_3b_step90_response_len.png$

Qwen2.5-Vl-3B-Instruct-GRPO-deepmath-ocr-7k

$vl_3b_step610_critic_score.png$ $vl_3b_step610_response_len.png$ $vl_3b_step610_entropy.png$

Acknowledge

This project builds upon the following works:

To Do

Use more data, and train larger models.
Mix different open-source datasets to improve model's reasoning and generalization ability.
Set different length penalties or rewards based on the difficulty of the task to avoid overthinking on simple tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 524 Commits
.github		.github
.vscode		.vscode
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Math-OCR-Zero

Installation

Data Preparation

Training

Datasets

Models

Acknowledge

To Do

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 132

Uh oh!

Languages

License

minlik/math-ocr-zero

Folders and files

Latest commit

History

Repository files navigation

Math-OCR-Zero

Installation

Data Preparation

Training

Datasets

Models

Acknowledge

To Do

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 132

Uh oh!

Languages

Packages